doesn't work with unicode symbols in path

Issue #37 open
René Dudfield created an issue

== Vasiliy, 2009-09-01 11:32:23 -0700

{{{ This method doesn't work with unicode symbols in os path. For example: audiofile = u'D:/Work/Projects/Programming/iTutor/src/iTutor/dictionaries/Казахо - Русский/3.mp3'

return error: pygame.error: Couldn't read from 'D:/Work/Projects/Programming/iTutor/src/iTutor/dictionaries/Казахо - Русский/3.mp3'

if i'm delete 'Казахо - Русский/' from audiofile variable then method working normally. }}}

Comments (10)

  1. René Dudfield reporter
    • changed status to open
    • changed milestone to 1.9.2

    This works with oggs and wavs on OSX.

    I assume this is another smpeg error. Or perhaps a windows related error.

    Needs testing on windows.

  2. Lenard Lindstrom

    According to the Python 2.7 sys.getfilesystemencoding() docs no encoding is done to the file path. So the UTF-16 Python string is just passed through. I don't know what encoding Windows uses, thought.

  3. Former user Account Deleted

    I think this might be a bytes/unicode misunderstanding: I've tested this only on linux, on an utf8 filesystem, but'<utf8 encoded string>') worked, whereas'<utf8 encoded string>') didn't., but'<utf8 encoded string>'.encode('utf8')) again did.

  4. Jason Marshall

    My plan is to wrap and with a function that uses Python's built-in open function to open Unicode paths.

  5. Jason Marshall

    Actually, the <unicode string>.encode('utf8') trick from Anonymous works on Windows for music.

  6. Thomas Kluyver

    I've found the relevant source in music.c, and it's using RWopsEncodeFilePath to turn a unicode path into a bytes path. This is OK on Unix systems where paths are bytes, but Windows paths are unicode (handled as UTF-16). The 'filesystem encoding' on Windows is 'mbcs', which refers to the current system codepage. However, the typical default codepages (e.g. cp1252 for Western European languages) cannot represent all characters that might be in a path, so encoding a path is lossy. The right fix would be to open files using the unicode/wchar path on Windows.

    I haven't tested this, but it looks like it should be possible to work around this by opening the file object in Python first:

    with open(path) as f:

    I think Python's open function uses unicode paths on Windows.

  7. Log in to comment