Issue #7 invalid

Character encoding issues when viewing ID3 tags

Karl Ove Hufthammer
created an issue

eyeD3 seems to have some issues with correctly displaying ID3 tags coded in a certain way on the command line. Basically, some strings come out double-encoded, so that a ‘å’ looks like ‘Ã¥’ (which is UTF-8 for ‘å’ when interpreted as latin1).

I’ll attach two MP3 files (I just ran ‘head -50’ on a few longer files with and without this problem, to reduce the file sizes, so the complete files may not 100% valid MP3 files, but they’re playable, and the ID3 info is intact). The file ‘ok-small.mp3’ works fine:

$ eyeD3 ok-small.mp3 title: Damen i tåken – del 4 av 8 […] Kriminalserie i åtte episoder

But the file ‘not-ok-small.mp3’ shows wrongly encoded characters:

$ eyeD3 ok-small.mp3 title: Fatuhiva – tilbake til naturen – del 7 av 10 album: Verdt Ã¥ lese […] Blir ogsÃ¥ sendt senere i kveld kl. 19.00.

Note that in both cases the title (encoded in UTF-8, since it contains an en-dash (–) not available in latin1) are correctly displayed. But for the first file the ‘å’ (in ‘åtte’ in the description tag) is correctly displayed, while for the second file it is incorrectly displayed as ‘Ã¥’ (for the album and description tag, ‘Verdt å lese’ and ’Blir også’).

The tags are correctly displayed if I open the files in either Amarok or VLC.

(For the first file, I added the tags using an old version of Amarok, which I believe stores everything as UTF-16. For the second file I used a more recent version of Amarok, which is I believe smarter when encoding the tags, and only using a character encoding if it is needed (i.e., it uses latin1 if all characters are available in latin1, and UTF-8 if not).

Comments (3)

  1. Travis Shirk repo owner

    Hi Karl,

    This is a bug in the program that made these tags. The data is utf-8 encoded but the ID3 frame says it is latin 1 encoded. Here's the bad data: '\x00Verdt \xc3\xa5 lese'. The first null byte is the problem, if it were \x03 (utf-8) all would be good since that's how the string is encoded. Luckily you can use eyeD3 to fix this file:

    $ eyeD3 -A 'Verdt å lese' issue7_not-ok-small.mp3
    

    The frame remains latin1 encoded but the right bytes go in: '\x00Verdt \xe5 lese'.

  2. Log in to comment