Ingestion

The following documentation describes our ingestion which includes several automated steps based on queues. While the data are instantly stored in our databases, the audio analysis and the computation of the products could take up to several hours to be available.

Once configured, the ingestion could easily take place daily to ensure a permanent update of your catalogues. In consideration of fair usage of our services, please notify us in prevision of large ingestions.

The ingestion is the process allowing you to add content to one of your catalogues, including all the related steps to make your track available for your licenced product.

Depending on your assets and/or your usage, you may use one of another way to ingest your content. Note that even though we only require an audio-file and its reference in your system (called outer_id), it may be interesting for you to ingest more metadata linked to your audio files. To gain a better ergonomy on our Portal, we usually recommend ingesting your outer_id, the title, the related artist and the ISRC amongst all your audio files. These metadata will need a specific mapping for us to parse all your content during the ingestion; this will be described here below.

Note that you can create several catalogs inside your organization, and use a different process of ingestion for each of them at the same time.

Summary

AWS S3 Bucket

Included to your account, you have access to one single dedicated Amazon S3 Bucket in which you will have to deliver your assets to be analyzed:

{
	"bucket_name": "msm-s3-{YOUR_ORGANIZATIO_ID}-{AWS_REGION}-prd",
	"bucket_region": "{AWS_REGION}",
	"aws_access_key_id": "{YOUR_PRIVATE_ACCESS_KEY_ID}",
	"aws_secret_access_key": "{YOUR_PRIVATE_ACCESS_KEY_SECRET}"
}

This S3 Bucket is dedicated and secured for your organization only. All the assets that you are delivering plus all the assets we are producing are stored in this unique location.

Several paths already are configured:

PathDescriptionAccess
csv_outputsContains all the export made by Musimotionro
deliveryRoot delivery arearo
delivery/audioRoot delivery for all audio filesrw
delivery/ddexRoot delivery for all DDEX filesrw
delivery/mappingRoot repository for all mapping filesrw
delivery/metadataRoot delivery for all CSV filesrw
INFO.jsonTechnical configuration (JSON format)ro
live_analysisTemporary storage for the Live analysis-
logsStorage for all the logs produced by our services-
processedRoot storage area for processed files-
processed/audioMain storage for processed audio files-
processed/featuresMain storage for extracted features-

Access through CLI

Once you have configured your environment to access the bucket, the following commands should work:

For you to get the region in which your bucket is located:

aws s3api get-bucket-location --bucket {YOUR_BUCKET_NAME}
{
    "LocationConstraint": "eu-west-1"
}

For you to list the content of your bucket:

aws s3 ls s3://{YOUR_BUCKET_NAME}/                                                                                                                                               
                           PRE csv_outputs/
                           PRE delivery/
                           PRE live_analysis/
                           PRE logs/
                           PRE processed/
                           831 INFO.json
                            36 {YOUR_ORGANIZATION_NAME}

For you to recursively list all the files that are in the delivery area:

aws s3 ls s3://{YOUR_BUCKET_NAME}/delivery --recursively

Access through Interface

If you prefer to avoid the usage of the command line, you may connect to your dedicated bucket by using any software supporting the S3-protocol.

We recommend you to use Cyberduck for which you will have to configure the key and the path. Once connected, you will be able to navigate in the directories in order to upload your audio files in the right folder:

Screenshot
Preview

Audio files only, "free naming"

This is the simplest way to ingest content in one of your catalogs. Note that the outer_id will correspond to the filename of the related audio file.

Use-cases

  • You have a bunch of audio files that are not formatted to follow a proper format.
  • You will not match our results with another database or match the data based on the filenames.
  • You want to have a quick look at our results through our portal.

Preparation

Simply upload all your audio files into the right path delivery/audio/:

# one file at once
aws s3 cp LOCAL_AUDIO_FILE.mp3 s3://{YOUR_BUCKET_NAME}/delivery/audio/REMOTE_AUDIO_FILE.mp3
# all the content of a specific folder
aws s3 cp LOCAL_FOLDER/ s3://{YOUR_BUCKET_NAME}/delivery/audio/

Note that you can create as many subfolders you want, as long as their main path is delivery/audio/.

Read more about the S3 cp commands

Request

curl --location -g --request POST 'https://api-v2.musimap.io/ingestion/audio_only?delivery_dir_path={THE_PARENT_DELIVERY_PATH}&catalog_id={CATALOG_ID}&delete_delivered_file={TRUE|FALSE}' \
--header 'Authorization: Bearer {VALIDE_ACCESS_TOKEN}'
ParameterValue TypeDescription
delivery_dir_pathstringPath on S3, relative to ./delivery/audio/
catalog_idstring (UUID)Musimap Internal Unique Identifier for the catalog
delete_delivered_fileboolean (default = "True")Whether the file is deleted from the delivery once processed

Process

Once triggered, the ingestion will start scanning the delivery_dir_path and create one single entry for each found files. Please note that the filename will become the outer_id which is supposed to be unique. This means that a file could overwrite any previously uploaded file having the same name.

Note that the audio files are moved during the process. Depending on the parameter delete_delivered_file, the delivery_dir_path will be empty at the end of this first step.

Once all the tracks saved in our database, the audio-analysis will start sending every audio file to our audio analyzers. This system is based on messaging queues; the audio analysis speed depends on the number of tracks being processed.

Several times a day, another process will compute the similarities for Musimatch or the tags for Musimotion and Musime. If a track hasn't been analysed yet, it will be computed in another thread later on. This third step is defining the availability of this entry for your licenced product.

Audio files only, "Structured naming"

This is the simplest way to ingest formatted content in one of your catalogues.

Use-cases

  • All your audio files have the same naming convention.
  • You want to match our results with your infrastructure, based on a specific reference.
  • You want to get benefits of our service by quickly adding simple metadata to your audio files.

Preparation

Simply upload all your audio files into the right path delivery/audio/:

# one file at once
aws s3 cp LOCAL_AUDIO_FILE.mp3 s3://{YOUR_BUCKET_NAME}/delivery/audio/REMOTE_AUDIO_FILE.mp3
# all the content of a specific folder
aws s3 cp LOCAL_FOLDER/ s3://{YOUR_BUCKET_NAME}/delivery/audio/

Note that you can create as many subfolders you want, as long as their main path is delivery/audio/.

Read more about the S3 cp commands

Mapping

This file should tell us about your naming convention and will allow us to parse each of your filenames to extract the right information.

As an example, if your filename is composed as "outer_id"-"isrc".ext, your mapping should look like:

mapping:
  - separator: "-"
  - column_0: "outer_id"
  - column_1: "isrc"

Available values:

  • outer_id
  • title
  • isrc
  • release_date
  • artist_name

Once this file is written (and validated), you may want to store it inside your bucket:

aws s3 cp local_mapping.yaml s3://{YOUR_BUCKET_NAME}/delivery/mapping/default_mapping.yaml

Note that our Support Team will usually configure your very first ingestion, and a working example is provided as a validation.

Request

curl --location -g --request POST 'https://api-v2.musimap.io/ingestion/audio_mapping?mapping_filename=default_mapping.yaml&delivery_dir_path={THE_PARENT_DELIVERY_PATH}&catalog_id={CATALOG_ID}&delete_delivered_file={TRUE|FALSE}' \
--header 'Authorization: Bearer {VALIDE_ACCESS_TOKEN}'
ParameterValue TypeDescription
delivery_dir_pathstringPath on S3, relative to ./delivery/audio/
catalog_idstring (UUID)Musimap Internal Unique Identifier for the catalog.
delete_delivered_fileboolean (default = "True")Whether the file is deleted from the delivery once processed
mapping_filenamestringThe filename of your mapping stored into ./delivery/mapping
mapping_contentstringThe YAML file containing your mapping, if not stored in S3

Process

Once triggered, the ingestion will start scanning the delivery_dir_path and create one entry for each found files. The extracted data will be stored amongst the audio file in our databases. Please note that the outer_id is supposed to be unique and will overwrite any previous entry having the same reference.

Note that the audio files are moved during the process. Depending on the parameter delete_delivered_file, the delivery_dir_path will be empty at the end of this first step.

Once all the tracks saved in our database, the audio-analysis will start sending every audio file to our audio analyzers. This system is based on messaging queues; the audio analysis speed depends on the number of tracks being processed.

Several times a day, another process will compute the similarities for Musimatch or the tags for Musimotion and Musime. If a track hasn't been analysed yet, it will be computed in another thread later on. This third step is defining the availability of this entry for your licenced product.

Audio Files & Metadata (JSON)

This is the most complete way to ingest audio files with metadata.

Use-cases

  • Your audio files don't have any structured filenames.
  • You want us to store all the metadata related to an audio file in order to retrieve all the information immediatly.
  • You want to get benefits of our service through our Portal.

Preparation

S3 Storage

In order to fetch the media on S3, simply upload all your audio files into the right path delivery/audio/:

# one file at once
aws s3 cp LOCAL_AUDIO_FILE.mp3 s3://{YOUR_BUCKET_NAME}/delivery/audio/REMOTE_AUDIO_FILE.mp3
# all the content of a specific folder
aws s3 cp LOCAL_FOLDER/ s3://{YOUR_BUCKET_NAME}/delivery/audio/

Note that you can create as many subfolders you want, as long as their main path is delivery/audio/.

Read more about the S3 cp commands

Remote Storage

If your audio files are publicly available, you can setup the ingestion to download them. In such a case, you will need to fill the primary_media of each track with the complete URL wherefrom the file could be downloaded.

JSON Body

Once your audio files have been uploaded, you will need to query our Web-API to ingest them with the right information. You may use one single query for up to 25 tracks. The query will then contain all the information for us to retrieve the right audio file and to save it with all the information you want us to store. For each of those tracks, the following structure needs to be respected:

{
    "outer_id": "string",
    "references": [
      {
        "id": "string",
        "source": "string"
      }
    ],
    "title": "string",
    "lyrics": "string",
    "isrc": "string",
    "release_date": 0,
    "albums": [
      {
        "upc": "string",
        "title": "string",
        "release_date": "string",
        "type": "Compilation",
        "references": [
          {
            "id": "string",
            "source": "string"
          }
        ],
        "track_position": "string",
        "disk_number": "string"
      }
    ],
    "artists": [
      {
        "id": "string",
        "name": "string",
        "role": "string"
      }
    ],
    "primary_media": "string",
    "customer_tags": [
      {
        "tag": "type of Rock",
        "category": "Rock"
      }
    ]
  }

Note that only the fields outer_id and primary_media are required, all the others are optional.

ParameterValue TypeDescription
outer_idstringYour unique identifier for this track
referencesNested objectA list of official references for this track, sorted by source
titlestringThe official title for this track
lyricsstringThe complete lyrics for this track
isrcstringThe unique ISRC reference for this track
release_dateintegerThe date of release (YYYYMMDD)
albums.upcstringThe unique UPC reference for this album
albums.titlestringThe title for the album
albums.release_datestringThe date of release (YYYYMMDD)
albums.typestringThe type of album (Official, Compilation, Single,...)
albums.referencesNested objectA list of official references for this album, sorted by source
albums.track_positionstringThe position of the track on the related disk_number
albums.disk_numberstringThe disk on which the track could be listened
artists.idstringA specific identifier for the related artist
artists.namestringThe name of a specific artist related to this track
artists.rolestringA specific role you would like to attach to the artist
primary_mediastringRemote URL or path on S3, relative to ./delivery/audio/
customer_tagsNested objectA list of tags, sorted by categories

Note that you will be able to retrieve all those information by using the enriched response for several of our endpoints.

Request

In addition to this BODY, several QUERY parameters allow you to configure the request:

ParameterValue TypeDescription
media_fetch_types3 or download (default: s3)Whether the file is stored on S3 or remotely
catalog_idany stringMusimap Internal Unique Identifier for the catalog.
ingestion_idany stringA unique reference for this ingestion
overwrite_audio_fileBOOLEAN (default: TRUE)In case of an already existing outer_id, whether the audio file needs to be overwritten
delete_delivered_fileBOOLEAN (default: TRUE)Whether the file is deleted from the delivery once processed

Note that the ingestion_id is any string that will create a sub-collection of your entries. We advise you to generate one unique ingestion_id per day or week. If omitted, a timestamp will be used.

curl --location -g --request POST 'https://api-v2.musimap.io/ingestion/json?media_fetch_type={s3|download}&catalog_id={CATALOG_ID}&ingestion_id={INGESTION_ID}&overwrite_audio_file={TRUE|FALSE}&delete_delivered_file={TRUE|FALSE}' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {VALIDE_ACCESS_TOKEN}' \
--data-raw '[
  {
    "outer_id": "string",
    "references": [
      {
        "id": "string",
        "source": "string"
      }
    ],
    "title": "string",
    "lyrics": "string",
    "isrc": "string",
    "release_date": 0,
    "albums": [
      {
        "upc": "string",
        "title": "string",
        "release_date": "string",
        "type": "Compilation",
        "references": [
          {
            "id": "string",
            "source": "string"
          }
        ],
        "track_position": "string",
        "disk_number": "string"
      }
    ],
    "artists": [
      {
        "id": "string",
        "name": "string",
        "role": "string"
      }
    ],
    "primary_media": "string",
    "customer_tags": [
      {
        "tag": "type of Rock",
        "category": "Rock"
      }
    ]
  }
]'

Process

Once triggered, the ingestion will start parsing the BODY of your request and create one entry for each found track. The extracted data will be stored amongst the audio file in our databases. Please note that the outer_id is supposed to be unique and will overwrite any previous entry having the same reference.

Note that the audio files are moved during the process. Depending on the parameter delete_delivered_file, the delivery_dir_path will be empty at the end of this first step.

Once all the tracks saved in our database, the audio-analysis will start sending every audio file to our audio analyzers. This system is based on messaging queues; the audio analysis speed depends on the number of tracks being processed.

Several times a day, another process will compute the similarities for Musimatch or the tags for Musimotion and Musime. If a track hasn't been analysed yet, it will be computed in another thread later on. This third step is defining the availability of this entry for your licenced product.