Ingestion

The following documentation describes our ingestion which includes several automated steps based on queues. While the data are instantly stored in our databases, the audio analysis and the computation of the products could take up to several hours to be available.

Once configured, the ingestion could easily take place daily to ensure a permanent update of your catalogues. In consideration of fair usage of our services, please notify us in prevision of large ingestions.

The ingestion is the process allowing you to add content to one of your catalogues, including all the related steps to make your track available for your licenced product.

Depending on your assets and/or your usage, you may use one of another way to ingest your content. Note that even though we only require an audio-file and its reference in your system (called outer_id), it may be interesting for you to ingest more metadata linked to your audio files. To gain a better ergonomy on our Portal, we usually recommend ingesting your outer_id, the title, the related artist and the ISRC amongst all your audio files. These metadata will need a specific mapping for us to parse all your content during the ingestion; this will be described here below.

Note that you can create several catalogs inside your organization, and use a different process of ingestion for each of them at the same time.

Summary

AWS S3 Bucket

Included to your account, you have access to one single dedicated Amazon S3 Bucket in which you will have to deliver your assets to be analyzed:

{
	"bucket_name": "msm-s3-{YOUR_ORGANIZATIO_ID}-{AWS_REGION}-prd",
	"bucket_region": "{AWS_REGION}",
	"aws_access_key_id": "{YOUR_PRIVATE_ACCESS_KEY_ID}",
	"aws_secret_access_key": "{YOUR_PRIVATE_ACCESS_KEY_SECRET}"
}

This S3 Bucket is dedicated and secured for your organization only. All the assets that you are delivering plus all the assets we are producing are stored in this unique location.

Several paths already are configured:

Path Description Access
csv_outputs Contains all the export made by Musimotion ro
delivery Root delivery area ro
delivery/audio Root delivery for all audio files rw
delivery/ddex Root delivery for all DDEX files rw
delivery/mapping Root repository for all mapping files rw
delivery/metadata Root delivery for all CSV files rw
INFO.json Technical configuration (JSON format) ro
live_analysis Temporary storage for the Live analysis -
logs Storage for all the logs produced by our services -
processed Root storage area for processed files -
processed/audio Main storage for processed audio files -
processed/features Main storage for extracted features -

Access through CLI

Once you have configured your environment to access the bucket, the following commands should work:

For you to get the region in which your bucket is located:

aws s3api get-bucket-location --bucket {YOUR_BUCKET_NAME}
{
    "LocationConstraint": "eu-west-1"
}

For you to list the content of your bucket:

aws s3 ls s3://{YOUR_BUCKET_NAME}/                                                                                                                                               
                           PRE csv_outputs/
                           PRE delivery/
                           PRE live_analysis/
                           PRE logs/
                           PRE processed/
                           831 INFO.json
                            36 {YOUR_ORGANIZATION_NAME}

For you to recursively list all the files that are in the delivery area:

aws s3 ls s3://{YOUR_BUCKET_NAME}/delivery --recursively

Access through Interface

If you prefer to avoid the usage of the command line, you may connect to your dedicated bucket by using any software supporting the S3-protocol.

We recommend you to use Cyberduck for which you will have to configure the key and the path. Once connected, you will be able to navigate in the directories in order to upload your audio files in the right folder:

Screenshot

Audio files only, "free naming"

This is the simplest way to ingest content in one of your catalogs. Note that the outer_id will correspond to the filename of the related audio file.

Use-cases

  • You have a bunch of audio files that are not formatted to follow a proper format.
  • You will not match our results with another database or match the data based on the filenames.
  • You want to have a quick look at our results through our portal.

Preparation

Simply upload all your audio files into the right path delivery/audio/:

# one file at once
aws s3 cp LOCAL_AUDIO_FILE.mp3 s3://{YOUR_BUCKET_NAME}/delivery/audio/REMOTE_AUDIO_FILE.mp3
# all the content of a specific folder
aws s3 cp LOCAL_FOLDER/ s3://{YOUR_BUCKET_NAME}/delivery/audio/

Note that you can create as many subfolders you want, as long as their main path is delivery/audio/.

Read more about the S3 cp commands

Request

curl --location -g --request POST 'https://api-v2.musimap.io/ingestion/audio_only?delivery_dir_path={THE_PARENT_DELIVERY_PATH}&catalog_id={CATALOG_ID}&delete_delivered_file={TRUE|FALSE}' \
--header 'Authorization: Bearer {VALIDE_ACCESS_TOKEN}'
Parameter Value Type Description
delivery_dir_path string Path on S3, relative to ./delivery/audio/
catalog_id string (UUID) Musimap Internal Unique Identifier for the catalog
delete_delivered_file boolean (default = "True") Whether the file is deleted from the delivery once processed

Process

Once triggered, the ingestion will start scanning the delivery_dir_path and create one single entry for each found files. Please note that the filename will become the outer_id which is supposed to be unique. This means that a file could overwrite any previously uploaded file having the same name.

Note that the audio files are moved during the process. Depending on the parameter delete_delivered_file, the delivery_dir_path will be empty at the end of this first step.

Once all the tracks saved in our database, the audio-analysis will start sending every audio file to our audio analyzers. This system is based on messaging queues; the audio analysis speed depends on the number of tracks being processed.

Several times a day, another process will compute the similarities for Musimatch or the tags for Musimotion and Musime. If a track hasn't been analysed yet, it will be computed in another thread later on. This third step is defining the availability of this entry for your licenced product.

Audio files only, "Structured naming"

This is the simplest way to ingest formatted content in one of your catalogues.

Use-cases

  • All your audio files have the same naming convention.
  • You want to match our results with your infrastructure, based on a specific reference.
  • You want to get benefits of our service by quickly adding simple metadata to your audio files.

Preparation

Simply upload all your audio files into the right path delivery/audio/:

# one file at once
aws s3 cp LOCAL_AUDIO_FILE.mp3 s3://{YOUR_BUCKET_NAME}/delivery/audio/REMOTE_AUDIO_FILE.mp3
# all the content of a specific folder
aws s3 cp LOCAL_FOLDER/ s3://{YOUR_BUCKET_NAME}/delivery/audio/

Note that you can create as many subfolders you want, as long as their main path is delivery/audio/.

Read more about the S3 cp commands

Mapping

This file should tell us about your naming convention and will allow us to parse each of your filenames to extract the right information.

As an example, if your filename is composed as "outer_id"-"isrc".ext, your mapping should look like:

mapping:
  - separator: "-"
  - column_0: "outer_id"
  - column_1: "isrc"

Available values:

  • outer_id
  • title
  • isrc
  • release_date
  • artist_name

Once this file is written (and validated), you may want to store it inside your bucket:

aws s3 cp local_mapping.yaml s3://{YOUR_BUCKET_NAME}/delivery/mapping/default_mapping.yaml

Note that our Support Team will usually configure your very first ingestion, and a working example is provided as a validation.

Request

curl --location -g --request POST 'https://api-v2.musimap.io/ingestion/audio_mapping?mapping_filename=default_mapping.yaml&delivery_dir_path={THE_PARENT_DELIVERY_PATH}&catalog_id={CATALOG_ID}&delete_delivered_file={TRUE|FALSE}' \
--header 'Authorization: Bearer {VALIDE_ACCESS_TOKEN}'
Parameter Value Type Description
delivery_dir_path string Path on S3, relative to ./delivery/audio/
catalog_id string (UUID) Musimap Internal Unique Identifier for the catalog.
delete_delivered_file boolean (default = "True") Whether the file is deleted from the delivery once processed
mapping_filename string The filename of your mapping stored into ./delivery/mapping
mapping_content string The YAML file containing your mapping, if not stored in S3

Process

Once triggered, the ingestion will start scanning the delivery_dir_path and create one entry for each found files. The extracted data will be stored amongst the audio file in our databases. Please note that the outer_id is supposed to be unique and will overwrite any previous entry having the same reference.

Note that the audio files are moved during the process. Depending on the parameter delete_delivered_file, the delivery_dir_path will be empty at the end of this first step.

Once all the tracks saved in our database, the audio-analysis will start sending every audio file to our audio analyzers. This system is based on messaging queues; the audio analysis speed depends on the number of tracks being processed.

Several times a day, another process will compute the similarities for Musimatch or the tags for Musimotion and Musime. If a track hasn't been analysed yet, it will be computed in another thread later on. This third step is defining the availability of this entry for your licenced product.

Audio Files & Metadata (JSON)

This is the most complete way to ingest audio files with metadata.

Use-cases

  • Your audio files don't have any structured filenames.
  • You want us to store all the metadata related to an audio file in order to retrieve all the information immediatly.
  • You want to get benefits of our service through our Portal.

Preparation

S3 Storage

In order to fetch the media on S3, simply upload all your audio files into the right path delivery/audio/:

# one file at once
aws s3 cp LOCAL_AUDIO_FILE.mp3 s3://{YOUR_BUCKET_NAME}/delivery/audio/REMOTE_AUDIO_FILE.mp3
# all the content of a specific folder
aws s3 cp LOCAL_FOLDER/ s3://{YOUR_BUCKET_NAME}/delivery/audio/

Note that you can create as many subfolders you want, as long as their main path is delivery/audio/.

Read more about the S3 cp commands

Remote Storage

If your audio files are publicly available, you can setup the ingestion to download them. In such a case, you will need to fill the primary_media of each track with the complete URL wherefrom the file could be downloaded.

JSON Body

Once your audio files have been uploaded, you will need to query our Web-API to ingest them with the right information. You may use one single query for up to 25 tracks. The query will then contain all the information for us to retrieve the right audio file and to save it with all the information you want us to store. For each of those tracks, the following structure needs to be respected:

{
    "outer_id": "string",
    "references": [
      {
        "id": "string",
        "source": "string"
      }
    ],
    "title": "string",
    "lyrics": "string",
    "isrc": "string",
    "release_date": 0,
    "albums": [
      {
        "upc": "string",
        "title": "string",
        "release_date": "string",
        "type": "Compilation",
        "references": [
          {
            "id": "string",
            "source": "string"
          }
        ],
        "track_position": "string",
        "disk_number": "string"
      }
    ],
    "artists": [
      {
        "id": "string",
        "name": "string",
        "role": "string"
      }
    ],
    "primary_media": "string",
    "customer_tags": [
      {
        "tag": "type of Rock",
        "category": "Rock"
      }
    ]
  }

Note that only the fields outer_id and primary_media are required, all the others are optional.

Parameter Value Type Description
outer_id string Your unique identifier for this track
references Nested object A list of official references for this track, sorted by source
title string The official title for this track
lyrics string The complete lyrics for this track
isrc string The unique ISRC reference for this track
release_date integer The date of release (YYYYMMDD)
albums.upc string The unique UPC reference for this album
albums.title string The title for the album
albums.release_date string The date of release (YYYYMMDD)
albums.type string The type of album (Official, Compilation, Single,...)
albums.references Nested object A list of official references for this album, sorted by source
albums.track_position string The position of the track on the related disk_number
albums.disk_number string The disk on which the track could be listened
artists.id string A specific identifier for the related artist
artists.name string The name of a specific artist related to this track
artists.role string A specific role you would like to attach to the artist
primary_media string Remote URL or path on S3, relative to ./delivery/audio/
customer_tags Nested object A list of tags, sorted by categories

Note that you will be able to retrieve all those information by using the enriched response for several of our endpoints.

Request

In addition to this BODY, several QUERY parameters allow you to configure the request:

Parameter Value Type Description
media_fetch_type s3 or download (default: s3) Whether the file is stored on S3 or remotely
catalog_id any string Musimap Internal Unique Identifier for the catalog.
ingestion_id any string A unique reference for this ingestion
overwrite_audio_file BOOLEAN (default: TRUE) In case of an already existing outer_id, whether the audio file needs to be overwritten
delete_delivered_file BOOLEAN (default: TRUE) Whether the file is deleted from the delivery once processed

Note that the ingestion_id is any string that will create a sub-collection of your entries. We advise you to generate one unique ingestion_id per day or week. If omitted, a timestamp will be used.

curl --location -g --request POST 'https://api-v2.musimap.io/ingestion/json?media_fetch_type={s3|download}&catalog_id={CATALOG_ID}&ingestion_id={INGESTION_ID}&overwrite_audio_file={TRUE|FALSE}&delete_delivered_file={TRUE|FALSE}' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {VALIDE_ACCESS_TOKEN}' \
--data-raw '[
  {
    "outer_id": "string",
    "references": [
      {
        "id": "string",
        "source": "string"
      }
    ],
    "title": "string",
    "lyrics": "string",
    "isrc": "string",
    "release_date": 0,
    "albums": [
      {
        "upc": "string",
        "title": "string",
        "release_date": "string",
        "type": "Compilation",
        "references": [
          {
            "id": "string",
            "source": "string"
          }
        ],
        "track_position": "string",
        "disk_number": "string"
      }
    ],
    "artists": [
      {
        "id": "string",
        "name": "string",
        "role": "string"
      }
    ],
    "primary_media": "string",
    "customer_tags": [
      {
        "tag": "type of Rock",
        "category": "Rock"
      }
    ]
  }
]'

Process

Once triggered, the ingestion will start parsing the BODY of your request and create one entry for each found track. The extracted data will be stored amongst the audio file in our databases. Please note that the outer_id is supposed to be unique and will overwrite any previous entry having the same reference.

Note that the audio files are moved during the process. Depending on the parameter delete_delivered_file, the delivery_dir_path will be empty at the end of this first step.

Once all the tracks saved in our database, the audio-analysis will start sending every audio file to our audio analyzers. This system is based on messaging queues; the audio analysis speed depends on the number of tracks being processed.

Several times a day, another process will compute the similarities for Musimatch or the tags for Musimotion and Musime. If a track hasn't been analysed yet, it will be computed in another thread later on. This third step is defining the availability of this entry for your licenced product.