os functions return '??' for unicode characters in paths on windows

Issue #2937 new
Creation Elemental
created an issue

I have a few files that contain emojis in their names, and also a folder that has such. Commands like os.getcwd, os.listdir, os.path.realpath, etc. will cause this to happen. This is also a problem with pure windows installations of python2. ( https://bugs.python.org/issue35670?@ok_message=msg%20333100%20created%0Aissue%2035670%20created&@template=item )

I have not yet been able to test if this happens on non-windows installations of pypy2.

For example, say you have a folder simply called '🔭'. If you run python inside of it and run os.getcwd() you will simply get '??' as the result. This breaks MANY of my programs that depend on knowing exactly where they are, and knowing the contents of a directory to pass to other functions.

Comments (2)

  1. mattip

    Since it fails on CPython as well, we are unlikely to fix this until they do.

    On the other hand, pypy3 on win32 crashes on running in a non-ascii directory

    cd d:\pypy_stuff\abשלוםasd                           
    Python 3.5.3 (f638e10d6074, Jan 06 2019, 02:03:06)                      
    [PyPy 6.1.0-alpha0 with MSC v.1910 32 bit] on win32                     
    Type "help", "copyright", "credits" or "license" for more information.  
    debug: OperationError:                                                  
    debug:  operror-type: TypeError                                         
    debug:  operror-value: listdir: illegal type for path argument          

    pypy2 functions without crashing

  2. Armin Rigo

    Note that the CPython issue says it can't be fixed in 2.7. Unless I'm confusing things, that's correct, because say os.listdir('path') is supposed to return byte strings and not unicodes---but that's half of the story: there are variants of all functions that explicitly return unicodes. For example, call os.getcwdu() instead of os.getcwd(), and call os.listdir(u'path') instead of os.listdir('path').

    Now if these variants return unicode strings with '??' in them, then that's a bug and it should be fixed, imho.

  3. Log in to comment