1. PyPA
  2. Python Packaging Authority Projects
  3. setuptools

Pull requests

#20 Declined
Repository
saschpe
Branch
default-wininst-encoding
Repository
pypa
Branch
default

Always decode distfile as UTF-8 in extract_wininst_cfg. UTF-8 is a strict

Author
  1. Sascha Peilicke
Reviewers
Description
No description
  • Learn about pull requests

Comments (5)

  1. Jason R. Coombs

    Hi Sascha. Thanks for the pull request.

    The original code obviously indicates doubt about the proper encoding of the content. However, this pull request indicates changing the encoding without any justification. I want to be careful to make sure if changing the implementation that it is as correct as possible.

    It looks like your title starts to provide one with "UTF-8 is a strict...". Do you have a reference that indicates that bdist_wininst will always generate UTF-8-encoded content? Do you have any other reason to believe UTF-8 should be preferred? Do you have an example package where you encountered this error?

    I believe this failure warrants a bug report and unit test. Would you be willing to draft one of each for this issue and attempt to answer my questions in the ticket?

  2. Sascha Peilicke author

    "The original code obviously indicates doubt about the proper encoding of the content. However, this pull request indicates changing the encoding without any justification. I want to be careful to make sure if changing the implementation that it is as correct as possible."

    The current code simply fails when the config contains Unicode, as expressed in the commit message. Isn't that justification enough? ;-) I didn't bother writing a test before because the current set of tests is rather useless. But sure, I'll add a test-case.

    "It looks like your title starts to provide one with "UTF-8 is a strict...". Do you have a reference that indicates that bdist_wininst will always generate UTF-8-encoded content?

    Of course not, it could be in any encoding. But without knowing the encoding of a text file (through vim/emacs modelines or whatever), you can only guess. The only real solution is to use https://pypi.python.org/pypi/chardet to detect it. But detection may still fail. Either way, it's better to assume UTF-8 since that will catch 99% of all cases.

    "Do you have any other reason to believe UTF-8 should be preferred?

    "ascii" is a strict subset of "utf-8", thus it's an improvement (besides fixing the mentioned issue). https://en.wikipedia.org/wiki/Utf-8 has "The first 128 characters of Unicode, which correspond one-to-one with ASCII..."

    "Do you have an example package where you encountered this error?"

    It's been a while, gonna have to dig it up again...

  3. Sascha Peilicke author

    I finally recalled it's been one of the published pyOpenSSL egg files causing trouble when installed on a Windows machine. I'm not much of a Windows guy, but I'll try to recreate the issue and do a test case soon.

    I don't think your last comment is related, I never had any issues with install_exe. Also, this rather straightforward patch did fix the issue, so declining doesn't seem appropriate to me. Please reopen.