doesn't work with unicode symbols in path

Issue #37 open
René Dudfield
created an issue

== Vasiliy, 2009-09-01 11:32:23 -0700

{{{ This method doesn't work with unicode symbols in os path. For example: audiofile = u'D:/Work/Projects/Programming/iTutor/src/iTutor/dictionaries/Казахо - Русский/3.mp3'

return error: pygame.error: Couldn't read from 'D:/Work/Projects/Programming/iTutor/src/iTutor/dictionaries/Казахо - Русский/3.mp3'

if i'm delete 'Казахо - Русский/' from audiofile variable then method working normally. }}}

Comments (10)

  1. Lenard Lindstrom

    According to the Python 2.7 sys.getfilesystemencoding() docs no encoding is done to the file path. So the UTF-16 Python string is just passed through. I don't know what encoding Windows uses, thought.

  2. Anonymous

    I think this might be a bytes/unicode misunderstanding: I've tested this only on linux, on an utf8 filesystem, but'<utf8 encoded string>') worked, whereas'<utf8 encoded string>') didn't., but'<utf8 encoded string>'.encode('utf8')) again did.

  3. Thomas Kluyver

    I've found the relevant source in music.c, and it's using RWopsEncodeFilePath to turn a unicode path into a bytes path. This is OK on Unix systems where paths are bytes, but Windows paths are unicode (handled as UTF-16). The 'filesystem encoding' on Windows is 'mbcs', which refers to the current system codepage. However, the typical default codepages (e.g. cp1252 for Western European languages) cannot represent all characters that might be in a path, so encoding a path is lossy. The right fix would be to open files using the unicode/wchar path on Windows.

    I haven't tested this, but it looks like it should be possible to work around this by opening the file object in Python first:

    with open(path) as f:

    I think Python's open function uses unicode paths on Windows.

  4. Log in to comment