Tox >= 2 fails with UnicodeDecodeError trying to read project's readme

Issue #254 on hold
saaj created an issue

A CI build failed on a minor documentation update in py3 envs with new Tox installed. Tox < 2 is fine. Here's the stack trace:

Processing ./.tox/dist/HermesCache-0.5.2.zip
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 20, in <module>
      File "/tmp/pip-um4zde-build/setup.py", line 23, in <module>
        long_description = open('README.txt').read(),
      File "/home/ubuntu/src/bitbucket.org/saaj/hermes/.tox/py33-pylibmc/lib/python3.3/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2413: ordinal not in range(128)

Here's CI build link. It's kind of strange, because I can open python3 in terminal and open('README.txt').read() is fine.

Pip version is 7.0.3, and 1.4.1 on CI server. It seems the result is the same. According to the versions it's also likely a bug that download cache deprecation warning is shown disregarding installed version of pip.

DEPRECATION: --download-cache has been deprecated and will be removed in the future. Pip now automatically uses and configures its cache.

Comments (9)

  1. Ulrich Petri

    This is a result of the new env isolation that tox >= 2.0 performs. Python 3 is essentially unusable without a correct LANG setting.

    To fix this either add passenv = LANG or setenv = LANG=xx_XX.UTF-8 to the [testenv] section.

  2. Florian Bruhin

    Note that LANG is in passenv by default with 2.0.2.

    I actually prefer not passing LANG, as it can make issues show up which are otherwise hard to trigger. Someone might have a system with C locales, or maybe README.txt includes UTF-8 which would make this fail on Windows (which would use latin1 by default).

    Here, the proper fix would be to do open('README.TXT', encoding='utf-8') instead in your setup.py.

  3. Ulrich Petri

    Unfortunately it's not that simple. Many scripts and tools will fail in interesting and horrible ways under python3 if there is no LANG defined. You can read more about the pain of dealing with encodings on the command line in this section of the click docs by Armin Ronacher.

  4. saaj reporter

    Thanks for the info, guys. I confirm that the CI build passes with Tox 2.0.2 without changes.

    @The-Compiler encoding keyword is py3-only. open('README.txt', 'rb').read().decode('utf-8') should work on both.

    @ulope As long as LANG is now in passenv by default what is not that simple? Yes, I read Armin's blog and his py3 rants. There's a lot of useful information, but it should be taken with a hysteria-filter ;-)

  5. Ulrich Petri

    @saaj My "not that simple" comment was directed at Florian regarding the comment that the proper fix would be not to use LANG but the encoding parameter. And I totally agree that Armin's Py3 rants should be taken with a large bucket of salt ;)

  6. saaj reporter

    @ulope Oh, surely backward compatibility is important and otherwise it'll break a lot of builds (actually Tox was on the path to do it, see original #247). But the notice of @The-Compiler is reasonable and it's everyone's compromise. For instance I couldn't pip3 install my package in Docker container, because the LANG there is empty (C locale). So I decided to put setenv = LANG= in my tox.ini to anticipate such things in future.

  7. Log in to comment