pathlib chokes up on international characters on Python 2 on Windows

Create issue
Issue #25 wontfix
Remi Rampin created an issue

Totally valid filenames that totally worked with os/os.path make pathlib raise an OSError. This is because pathlib stores pathnames as bytes (which they aren't) in a fixed-length encoding (iso-8859-1?).

Example:

>>> import os, pathlib
>>> os.listdir('.')
['?.txt']
>>> os.listdir(u'.')
[u'\u203d.txt']
>>> p = pathlib.Path('.')
>>> f = list(p.iterdir())[-1]
>>> f
WindowsPath('?.txt')
>>> # this path is broken: pathlib can't represent this character
>>> f.exists()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python2.7.6\lib\site-packages\pathlib.py", line 1176, in exists
    self.stat()
  File "C:\Python2.7.6\lib\site-packages\pathlib.py", line 1051, in stat
    return self._accessor.stat(self)
  File "C:\Python2.7.6\lib\site-packages\pathlib.py", line 346, in wrapped
    return strfunc(str(pathobj), *args)
WindowsError: [Error 123] La syntaxe du nom de fichier, de rÚpertoire ou de volume est incorrecte: '?.txt'
>>> # of course, os has no problem here
>>> os.listdir(u'.')[-1]
u'\u203d.txt'
>>> os.path.exists(os.listdir(u'.')[-1])
True

I do not really expect you to be able to fix this. Representing all paths as unicode on Python 3 is wrong, just as is representing all paths as bytes on Python 2. You got two out of four possible combinations wrong, and the other two work by chance; I'll be using something else.

Comments (1)

  1. Antoine Pitrou repo owner

    Indeed, pathlib for 2.7 is provided out of convenience, but it won't have proper handling of non-ASCII paths under Windows. Python 2 makes it too cumbersome to achieve.

  2. Log in to comment