Wiki

Clone wiki

Shampoo / MigratingData

__TOC__

Migrating data

Whether you possess existing data from a legacy system, or you'd just like to automate the import of lots of tracks for Shampoo to use, you use can the provided import_raw_tracks command-line tool to this effect. At the moment only tracks can be mass imported.

Track metadata, tags, and cover art files are read and converted from the actual file metadata container (e.g. ID3, APE, etc.) and audio files are automatically copied to the configured Datastore. At least the artist and title tags must be present for a track to be imported.

Usage

The import_raw_tracks tool uses the following syntax:

import_raw_tracks <admin_username> <admin_password> <csv_import_file>

Where admin_username and admin_password are the login and password couple of an Administrator account, and csv_import_file a CSV-formatted file containing the location and miscellaneous data for the tracks you want to import. Each line in this file corresponds to a track to import. A row contains different fields, separated by a comma (,), that define parameters for importing the track. It follows this syntax:

<file_location>,<type>[(,<programme_label>)...]

Where file_location is either the File URI or the full local path to the actual track file to import, type can be song, jingle, or advert depending on the nature of this track, and programme_label the label of the programme you want to link this track to. You can define more than one programme label for a track if needed, or none.

Embed a whole field within double quotes (") if it contains commas (,).

  1. Open a Shell command line prompt and change directory to the /Shampoo folder from the decompressed distribution archive. cd Shampoo

  2. run the import_raw_tracks script with your own parameters. You are not required to launch the script with the same user as your Servlet container. Windows users can type: WEB-INF\scripts\import_raw_tracks admin password "H:\backup\import.csv"

    and Unix/Linux users:

    sh ./WEB-INF/scripts/import_raw_tracks.sh admin password /tmp/import.csv
    
  • If your servlet container is a full Java application server which includes the JDBC driver for your database, you must first copy the relevant JDBC JAR driver within the lib folder for the script to work.
  • You're advised to perform the migration while Shampoo is offline, but it's not mandatory. Unless you use the embedded Derby engine.

CSV import file example

H:\\backup\\resources\\pool\\2860.mp3,song,The Late Night Session
H:\\backup\\resources\\pool\\2861.mp3,song,The Late Night Session
H:\\backup\\resources\\pool\\2862.mp3,song,The Late Night Session
H:\\backup\\resources\\pool\\2863.mp3,song,The Late Night Session
H:\\backup\\resources\\pool\\2864.mp3,song,The Late Night Session
H:\\backup\\resources\\pool\\2865.mp3,song,The Archives,The 70s Disco Session,Main Programme,Latest Entries
H:\\backup\\resources\\pool\\2866.mp3,song,Latest Entries,The Archives,The 70s Disco Session,Main Programme
H:\\backup\\resources\\pool\\2867.mp3,song,Main Programme,Latest Entries,The Archives,The 70s Disco Session
H:\\backup\\resources\\pool\\2868.mp3,song,The 70s Disco Session,Main Programme,Latest Entries,The Archives
H:\\backup\\resources\\pool\\2869.mp3,song,The Neu Disco Session
H:\\backup\\resources\\pool\\2870.mp3,song,Main Programme,Latest Entries,The Archives,The 70s Disco Session
H:\\backup\\resources\\pool\\2871.mp3,song,"The Club, French and Disco House Session"
H:\\backup\\resources\\pool\\2872.mp3,song,"The Club, French and Disco House Session"
H:\\backup\\resources\\pool\\2873.mp3,song,The Archives,The 70s Disco Session,Main Programme,Latest Entries
H:\\backup\\resources\\pool\\2874.mp3,song,Latest Entries,The Archives,The 70s Disco Session,Main Programme
H:\\backup\\resources\\pool\\2875.mp3,song,Main Programme,The Archives,Latest Entries
H:\\backup\\resources\\pool\\2876.mp3,song,The Archives,The 70s Disco Session,Main Programme,Latest Entries
H:\\backup\\resources\\pool\\2877.mp3,song,Main Programme,Latest Entries,The Archives,The Italo Disco Session
H:\\backup\\resources\\pool\\2878.mp3,song,Main Programme,Latest Entries,The Archives,The Italo Disco Session
H:\\backup\\resources\\pool\\2879.mp3,song,The Late Night Session
H:\\backup\\resources\\pool\\2880.mp3,song,Main Programme,Latest Entries,The Archives,The 70s Disco Session
H:\\backup\\resources\\pool\\2881.mp3,song,Main Programme,The Archives,Latest Entries
H:\\backup\\resources\\pool\\2882.mp3,song,Main Programme,Latest Entries,The Archives,The Italo Disco Session
H:\\backup\\resources\\pool\\2883.mp3,song,Main Programme,Latest Entries,The Archives,The 70s Disco Session
H:\\backup\\resources\\pool\\2884.mp3,song,The Archives,Latest Entries,Main Programme

Batch updating track metadata with Discogs info

Updating the correct metadata for tracks can be a daunting and tedious task to perform. It can however be automated by automatically retrieving the corresponding Discogs data.

Different tags like genres, albums, and release dates can be converted, and album cover arts downloaded, if they were previously unspecified. The author and title tags for each track in the database are used to lookup for matches in Discogs' own database.

Usage

  1. Download a Discogs Releases monthly database dump from here. Don't decompress it.

#Open a Shell command line prompt and change directory to the /Shampoo/WEB-INF folder from the decompressed distribution archive.

cd Shampoo
cd WEB-INF
  1. run the update_discogs_metadata_tracks script with an Administrator login and password and the path to the Discogs Release file as parameters. You must launch the script with the same user as your Servlet container. Windows users can type: runas /u:user update_discogs_metadata_tracks admin password "H:\backup\discogs_20111001_releases.xml.gz"

    and Unix/Linux users:

    su -m user -c "sh ./update_discogs_metadata_tracks.sh admin password /tmp/discogs_20111001_releases.xml.gz"
    

    Where user is the actual userid used to run your Servlet container.

  • If your servlet container is a full Java application server which includes the JDBC driver for your database, you must first copy the relevant JDBC JAR driver within the lib folder for the script to work.
  • You're advised to perform the update while Shampoo is offline, but it's not mandatory. Unless you use the embedded Derby engine.
  • Third-party applications can only download around 1000 cover arts daily, according to the Discogs API v2. Update 2014 - Discogs has *once again* overhauled all of its APIs: cover arts cannot be downloaded anymore

Cleaning XML

Some Discogs dumps contain invalid XML entities; if the process prematurely stops with an error, you must clean your file first.

  1. Open a Shell command line prompt and change directory to the /Shampoo/WEB-INF/classes folder from the decompressed distribution archive. cd Shampoo/WEB-INF/classes

  2. type: java biz.ddcr.shampoo.cmdline.XMLUnicode2ASCII /tmp/discogs_20111001_releases.xml.gz

    Replace /tmp/discogs_20111001_releases.xml.gz by the actual location of your Discogs gzipped dump.

It will both strip all invalid characters and convert every Unicode characters into valid XML entities.

Updated