Issue #11 new

Re-add hashsum check to ez_setup.py

Christian Heimes
created an issue

For setuptools 0.6 ez_setup.py used to validate the downloaded file against a table of known hashsums. I wonder ... why was this feature removed? It adds a simple layer of security and prevents broken downloads, too.

Python 2.4+ support SHA-1, 2.5+ SHA-2 family (sha224 to sha512). Since setuptools still supports Python 2.4 we have to use SHA-1. It's broken but not as broken as MD5. You can limit the possibility of a collision attack if you include and verify the file size, too.

Comments (9)

  1. Jason R. Coombs

    I'm not sure why the feature was removed. I believe it was removed from Distribute early on, perhaps because the process for releasing and generating those hashes wasn't clear.

    Let's put together a doc or script that captures what that process looks like. Currently release.py is meant to be that script. I think the script will have to be refactored to generate the builds, then the generate hashes, then update ez_setup, then tag.

  2. Donald Stufft

    Another option is signing the builds with a known key and verifying that in ez_setup.py.

    You don't need to update the script each time for hashes then. There is ed25519 which has a public domain pure python option written by DJB (well known cryptographer) and ed25519 has small keys and deterministic signatures.

  3. Jason R. Coombs

    Am I incorrect in thinking that this issue is no longer important (or substantially less important) once we have ez_setup.py hosted on an SSL link behind a proper PKI with certificate validation (such as is in draft in the 1.0b releases)?

  4. sureshvv

    Does it make sense to implement the checksum via release.py for now until signing can be implemented? This will also serve as a validation that the download was successful.

  5. sureshvv

    How about:

    1. we have a file called sha1.txt
    2. Use a github hook to put the hashsum there.
    3. Check after download to ensure checksum is the same
  6. Jason R. Coombs

    It's not as simple as that. First, we're currently hosted on bitbucket, so it would have to be a bitbucket hook. Second, ez_setup will install different versions of setuptools. use_setuptools accepts a version parameter, and downloads that version. Therefore, even for source distributions, there isn't a single relevant hash. Add support for eggs and wheels, and there are many hashes per release. Third, the suggestion doesn't address issues around workflow. Currently, the workflow goes like this:

    1. Update version numbers.
    2. Tag release.
    3. Create sdist and upload.

    Then, the tagged revision contains the ez_setup relevant for that release. However, at the time of tagging, the hash is not available (as the sdist hasn't been created). Perhaps the sdist and hash could be created in advance of tagging, between steps 1 and 2, but it would have to save the hash somewhere that's not included in the sdist, or the sdist included in step 3 would be different (and might still be for other reasons if the sdist isn't completely deterministic).

    Rather than solve all of these issues (and other related ones), I'd rather rely on a more universal technique that applies to more than just setuptools. How do other software distribution channels solve this challenge?

  7. Donald Stufft

    The solution typically involves package signing. Unfortunately we do not have that in Python yet (but there's some work in TUF to make it happen).

    In the meantime you can implement your own adhoc signing using something like ed25519.py (http://ed25519.cr.yp.to/software.html). You'd need to embed your key in the ez_setup.py and host the signatures of the files somewhere. Then ez_setup.py would just download the file you want to install, and the signature, and verify that using the key baked into ez_setup.py that the signature of the file matches.

    Doing this would remove the need for TLS on the download.

  8. Log in to comment