[Patch] Metadata is UTF-8 on all platforms

Create issue
Issue #311 resolved
Stefan H. Holek created an issue

python3 setup.py --long-description fails to output UTF-8 on all platforms.

To fix this (which actually is a distutils bug, but well), the proposed patch overrides 'Distribution.handle_display_options()' and makes sure sys.stdout uses UTF-8 to encode metadata.

Patch 'Distribution': [[https://bitbucket.org/stefanholek/distribute/changeset/c3774f4e5633]]

Comments (8)

  1. Lennart Regebro

    I'm not entirely sure why it should be UTF-8 on all platforms. If the sys.stdout encoding is something else, it should reasonably use that, whatever the else is.

  2. Stefan H. Holek reporter

    UTF-8 can represent any character, whereas things like Latin-1 and Windows-1252 cannot. A simple mdash in your README can produce a UnicodeEncodeError if the stdout encoding is not "Unicode compatible". Printing a funny character or two is less of an issue than exploding in the user's face, IMNSHO.

    I also think that no package would be portable if metadata was allowed to be any random encoding. Also see http://docs.python.org/devguide/documenting.html#source-encoding. PyPI appears to require UTF-8 for non-ASCII characters as well.

  3. Stefan H. Holek reporter

    I want this to work on any platform, be it Mac, an old Latin-1 Linux, or Windows:

    python setup.py --long-description | rst2html -i utf-8

    (I assume that's pretty much what PyPI does as well)

    Without the patch(es), any character not representable in the local encoding will cause a UnicodeEncodeError. Imagine a Japanese README, how would this work anywhere not UTF-8 native?

  4. Daniel Holth

    IMO all encodings that are not utf-8 are broken.

    Metadata 1.3 finally defines UTF-8 as the only encoding allowed inside PKG-INFO.

  5. Log in to comment