Distutils produces metadata in unknown encoding

Jason R. Coombs
In Pull Request 45, Jurko observed that on Python 3.1, Python will generate the metadata files in an encoding relative to the build user's environment. Furthermore, starting with Python 3.2 but also on Python 2.6 and 2.7, the content is encoded using UTF-8.

pkg_resources currently assumes the metadata is UTF-8, so if non-ASCII characters are present and egg_info is run on Python 3.1 or earlier, the resulting metadata will fail to load on Python 3.2+.

  1. Jurko Gospodnetić

    Yup, just checked 2.6.6, 2.7.6 & 3.4.4 and they all do correct utf-8 encoding (although Python2 versions require that the data be given as unicode and not str in order for the encoding to be applied).

  2. Jurko Gospodnetić

    I added pull request #52, generalizing the solution for this issue to more Python versions.

    Some more background information on this issue:

    • Python 2.x supports writing package meta data given as utf-8 encoded byte strings, and since Python 2.6 it also supports writing package meta data given as a unicode string (CPython commit 4c683ec4415b3c4bfbc7fe7a836b949cb7beea03)

    • Python 3.x only supports writing package meta data given as a unicode string Python [3.0 - 3.2.2> does not support writing package meta data containing non-ASCII characters due to a distutils bug

    • Python 3.2.2 fixes the distutils bug (CPython commit fb4d2e6d393e96baac13c4efc216e361bf12c293)

    setuptools commit 1cd816bb7c933eecd9d8464e054b21c7d5daf2df works around the non-ASCII character issue for Python version 3.1.

    pull request #52 applies the same workaround for Python version range [3.0 - 3.2.2>.

    Hope this helps.

    Best regards, Jurko Gospodnetić

