venv non ascii support - Windows

Create issue
Issue #3147 new
Bernat Gabor created an issue

Seems to create a venv into a path that has non ascii char (è is mcsb encode-able so valid in file paths even when one does not use utf-8 filepaths) name in just does not work:

PS C:\Users\traveler\git\virtualenv> pypy3.exe -m venv 'envè' --without-pip
Error: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\traveler\\git\\virtualenv\\envè\\Scripts'

Furthermore, if one creates and moves into a folder containing a such character the venv site-package is no longer added to sys.path (the error is silently ignored - oddly enough pypy2 does not suffers from this issue):

PS C:\Users\traveler\git\virtualenv> pypy3.exe -m venv env --without-pip                    
PS C:\Users\traveler\git\virtualenv> mkdir env/site-packages

PS C:\Users\traveler\git\virtualenv> ./env/Scripts/pypy3.exe -c 'import sys; print(\"\n\".join(sys.path))'
C:\pypy\pypy3.6-v7.3.0-win32\lib_pypy\__extensions__
C:\pypy\pypy3.6-v7.3.0-win32\lib_pypy
C:\pypy\pypy3.6-v7.3.0-win32\lib-python\3
C:\pypy\pypy3.6-v7.3.0-win32\lib-python\3\lib-tk    
C:\Users\traveler\git\virtualenv\env\site-packages  

PS C:\Users\traveler\git\virtualenv> mv env envè

PS C:\Users\traveler\git\virtualenv> ./envè/Scripts/pypy3.exe -c 'import sys; print(\"\n\".join(sys.path))'
C:\pypy\pypy3.6-v7.3.0-win32\lib_pypy\__extensions__
C:\pypy\pypy3.6-v7.3.0-win32\lib_pypy
C:\pypy\pypy3.6-v7.3.0-win32\lib-python\3
C:\pypy\pypy3.6-v7.3.0-win32\lib-python\3\lib-tk    
C:\pypy\pypy3.6-v7.3.0-win32\site-packages

Slightly related is that for 3.6 one should support UTF-8 encoding on Windows (see https://www.python.org/dev/peps/pep-0529/) instead of mcsb, so here I’d expect the following folder name to pass "$ èрт🚒♞中片"

Discovered this while doing https://github.com/pypa/virtualenv/pull/1482

Comments (17)

  1. mattip

    Could you refactor this into a small test that fails? It probably has to do with os and not venv at all.

  2. Bernat Gabor reporter

    Here you go:

    import sys
    import os
    import shutil
    import subprocess
    
    print(sys.version)
    
    base = "a"
    if os.path.exists(base):
       shutil.rmtree(base)
    
    test_folder = os.path.join("a", "è")
    print(repr(test_folder))
    os.makedirs(test_folder)
    
    subprocess.call(["powershell.exe", "-c", "ls {}".format(base)])
    
    for d in os.listdir(base):
        print(repr(d))
    

    And correction, this seems to be fixed with pypy3 7.3.0… was behaving similarly in 7.2.0… however now pypy22.7 7.3.0 does the following:

    On the disk, the folder name becomes (viewed in explorer.exe) è … which is an encoding of è

  3. mattip

    I am not seeing the problem. When I run the test on pypy2.7-v7.3 it all works fine. Perhaps it is a codepage issue? Mine is set to 850 (chcp returns “Active code page: 850”)

    > ..\pypy2.7-v7.3.0-win32\pypy.exe test_1111.py      
    2.7.13 (724f1a7d62e8, Dec 23 2019, 20:20:22)         
    [PyPy 7.3.0 with MSC v.1500 32 bit]                  
    'a\\\xe8'                                            
    
    
        Directory: D:\pypy_stuff\pypy\a                  
    
    
    Mode                LastWriteTime         Length Name
    ----                -------------         ------ ----
    d-----       29/01/2020  12:29 PM                è   
    
    
    '\xe8'                                               
    

    Note that os.listdir is returning bytes since listdir is not being given a unicode object. It is [documented](https://docs.python.org/2.7/library/os.html#os.listdir) as

    if path is a Unicode object, the result will be a list of Unicode objects. Undecodable filenames will still be returned as string objects.”

  4. Bernat Gabor reporter

    The thing is you can’t really test this under PyPy… as the subprocess call is also encoded… a better test is to run the create command; open up explorer and see that there you have the cryptic name.

  5. mattip

    I would really like to be able to reproduce this, since I believe there is an issue, just not on my machine

  6. Bernat Gabor reporter

    Hmm, interesting. 😥 on my it’s different, damn. Any ideas how else this could be reproduced?

  7. Bernat Gabor reporter

    pypy3 -m venv 'env3è' --without-pip --clear
    

    This does fail for now, with:

    Error: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\traveler\\env3è\\Scripts'
    

    This is my reproduce for now.

    chcp
    Active code page: 437

  8. mattip

    I can reproduce this the failure with pypy3 v7.3. The problem is the directory is created by os.mkdir(path), but then fails to be detected by os.stat(path). I fail to reproduce this in a stand alone test. Perhaps path is created by two different mechanisms, so it has different semantics? Printing path I do not see a difference. Still investigating, help is welcome.

  9. Log in to comment