How much compatability is enough? Alternatively what steps does "suggest" take?

Issue #41 resolved
Donald Stufft
created an issue

So the original PEP 386 had the concept of "normalizing" a non compliant version into a compliant one. It did/does this through a series of regexes and string replacements.

With the current rules (as I've implemented them, using the comments as what PEP 440 will contain as well) achieve a 95.26% compatibility rate when checked against every version on PyPI. While creating a "normalize" / "suggest" functionality I've been able to get that number up to 98.76%. However my worry is that in normalizing we're creating versions that mean something different then what they actually mean. One such instance is pytz where we have "2012a" which will be normalized into 2012a0. Is 95.26% compatibility "enough" that we can just say we're not going to do normalization? On the other hand we're more or less just hoping that interpreting versions that happen to use a compatible scheme will interpret to something meaning the same thing anyways so maybe this is a always a problem and normalization doesn't really affect it.

On the other hand, I think if the normalization process is kept, then we should encode the exact transformation rules which are used. I also think if we do it then we should mandate it as part of the spec and not make it an optional thing. I'm thinking like how html5 defined a spec, but it also defined what browsers should do when pages don't follow that spec. If we have the capability to normalize (and it's reasonably implemented) I can't imagine any case where it wouldn't make more sense (or at least, just as much sense) to normalize so if we're normalizing, always normalizing seems like a sensible thing to me.

Thoughts?

Comments (36)

  1. Nick Coghlan

    Normalisation/suggestion was always intended as a tool to be used with a human checking the answer. The idea is to give two numbers, one for direct compatibility with the old permissive model, and one where there's a reasonably clear migration path.

  2. Donald Stufft reporter

    It may be because I've had a real shitty day, but I don't understand what that means exactly.

    If there is normalization then most/all tools are going to use it and then I think it makes sense to standardize it and make it a core part of the spec. I don't know what "human checking the answer" is but I don't think that most people are going to do that, or even be able to do anything about it if it's wrong. If pip uses normalization and someone notices that an old one is wrong what are they supposed to do about it?

  3. Nick Coghlan

    If a version is not syntactically compatible with PEP 440, automated migration to metadata 2.0 is NOT possible, because it would involve guessing.

    The hinting function is an aid for manual migration where a human can say "yes, that is what I meant", and to help us check the level of semantic (rather than strict syntactic) compatibility for the new version scheme before declaring it final. If the normalisation step is needed, there MUST be a prompt involved before publishing the altered metadata.

  4. Donald Stufft reporter

    That should be encoded into the PEP then, that automated tools and the like should not use normalization/fuzzy-reading of the version unless the author has been prompted and allowed it. As of right now it's not obvious (and actually pip is using the normalization routine in distlib ATM).

  5. Nick Coghlan

    Oh, I think I see the confusion. Version suggestions are aimed at package authors, not end users. If a version doesn't meet the syntactic restrictions, it has to keep using metadata 1.x. The suggestions are to help change the versioning scheme to a syntactically compatible one that still preserves the previous structure as much as possible.

  6. Donald Stufft reporter

    We should make a standard way for the "I need to parse this version and I don't know anything but the version" use case so that people don't end up with slightly different versioning behavior across all the tools.

  7. Donald Stufft reporter

    Heh, an interesting thing, if we don't do any sort of normalization then all of the pre-releases on PyPI which use versions like 1.0-dev will be seen as 1.0 with a local version of dev instead of 1.0.dev0

  8. Donald Stufft reporter

    However, the sorting that I currently have (which doesn't have the proper local version sorting) gets 99.36% compatibility with pkg_resources, so I'm not sure if it matters.

  9. Donald Stufft reporter

    Actually what you've said here doesn't follow what the PEP states.

    Software that automatically processes distribution metadata SHOULD attempt to normalize non-compliant version identifiers to the standard scheme, and ignore them if normalization fails. As any normalization scheme will be implementation specific, this means that projects using non-compliant version identifiers may not be handled consistently across different tools, even when correctly publishing the earlier metadata versions.

    I think that this is bad. We should strive to have consistent ordering even if the face of old versions.

  10. Nick Coghlan

    So, I think what we need to do here is:

    1. Drop the idea of using the metadata version as part of the ordering scheme
    2. Define a normalisation algorithm directly in PEP 440 ("-" -> ".", "missing numeric segment" -> "0", "alpha" -> "a", "beta" -> b", what else?)
    3. Define any version which can't be normalised as being lower precedence than versions which can, and defer the ordering of versions which can't be normalised to setuptools.
  11. Donald Stufft reporter

    I think a key question is the one in the title of this issue. How much compatibility is enough? As of right now we have 93.xx% compatibility with the versions on PyPI (that is, we can parse strictly that percentage of versions).

    1. Absolutely, I don't think we can do this at all without significant changes to the installer API.
    2. If we're going to normalize, then yes we should define it in PEP 440 and IMO it should be non-optional. In other words we shouldn't have a ghetto-ized "suggest" functionality. If we have normalization (which we already have with rc -> c and 01 -> 1) then it should be applied always as part of the core parsing semantics of PEP 440.
    3. I'd rather just say that versions which can't be normalized SHOULD be ignored. While your proposed option here is technically doable, I'd rather not formally bless setuptools into perpetuity here. If we feel the need to define some sort of sorting algorithm for these versions, then we should define the one that setuptools happens to use but we shouldn't say "just use setuptools". In other words, it should be completely possible to interact with versions by implementing something with nothing but Python and this PEP, no other libraries.
  12. Nick Coghlan

    Yes, I think that's a good way to go. The key compatibility numbers we'd be after then would be: 1. percentage of packages with no compatible versions 2. percentage of packages where PEP 440 vs setuptools leads to a different package being installed

  13. Donald Stufft reporter

    So here's compatibility numbers without any sort of normalization/suggestion attempts:

    $ invoke check.pep440 --cached
    Total Version Compatibility:              223643/238504 (93.77%)
    Total Sorting Compatibility (Unfiltered): 40175/45328 (88.63%)
    Total Sorting Compatibility (Filtered):   45318/45328 (99.98%)
    Projects with No Compatible Versions:     1994/45328 (4.40%)
    Projects with Matching Latest Version:    42847/45328 (94.53%)
    

    So I guess the question is, what sort of numbers are we trying to aim for? Are these numbers "good enough"? Is there any other types of numbers we want to look for? The danger of any sort of normalization is that any sort of fuzziness makes it more difficult to implement a parser and increases the chances of incorrectly interpreting a version. The upsides is obviously that we're less disruptive and will do something for more versions.

    Quick interpretation:

    • Total Version Compatibility = Make a giant list of all versions on PyPI (including duplicates) and see how many of them can be parsed using PEP 440.
    • Total Sorting Compatibility (Unfiltered) = Look at each project on PyPI, and see how many end up with the same list when sorted() with pkg_resources.parse_version as the key and Version as the key. It does not filter out any versions which cannot be parsed by PEP 440, so any project which contains a single invalid version will fail this.
    • Total Sorting Compatibility (Filtered) = Same as above, except filter out any versions (from both sides of the comparison) which cannot be parsed by PEP 440.
    • Projects with No Compatible Versions = Look for any projects which have at least one version which does not have any of their versions parse-able by PEP 440.
    • Projects with matching latest version = Sort the versions with PEP 440 and pkg_resources and see if the "latest" versions match. This will fail if the latest version cannot be parsed by PEP 440.
  14. Nick Coghlan

    From my perspective, the key numbers are the last two. "No Compatible Versions" means that switching pip to strict PEP 440 compliance would mean it could no longer install that project until it published a compatible release. For consistency, it may also be worth inverting the last number (so "Different Latest Version" at 5.47%) to state the proportion of cases where a strict PEP 440 parser would get a different version than a pkg_resources based parser.

    So I suggest doing a very simple first pass normaliser and see how much that improves the numbers. Suggested transformations:

    • '-' -> '.'
    • 'alpha' -> 'a'
    • 'beta' -> 'b'
    • remove all whitespace

    The following rules would be more complicated to implement, but may be worth trying (depending on exactly how the remaining versions are non-compliant after the above simpler transformations):

    • extraneous separator between a, b, c, rc, dev, pre or post and the following numeric element -> remove the separator
    • missing numeric element after a, b, c, rc, dev, pre or post -> implied 0
  15. Nick Coghlan

    Note we also have other backwards compatibility options at the pip tooling level. For example, if pip finds no compatible releases at all, it could print a warning and fall back to pkg_resources sorting. If a version specifier includes a PEP 440 incompatible version, that could also trigger a warning message and falling back to pkg_resources sorting.

    That kind of technique wouldn't be part of PEP 440 itself, it would be a practical matter of pip managing its own pkg_resources -> PEP 440 migration.

  16. Nick Coghlan

    Oh, and I think pip should explicitly special case pytz to cover the Olsen database numbering. That kind of project specific hack is annoying, but pytz's use of Olsen numbering is relatively unique, and I think the special casing would be less of a hassle than contorting the overall spec to cope with it.

  17. Donald Stufft reporter

    Yes pip could do something like that for falling back to pkg_resources. Let me try some of the simpler normalizations and see where that leaves us as well as invert the numbers for the one comparison and see where that leaves us.

    I'm not sure what you think we should do for the olsen database? They've already adopted a version number that fits within the spec, they use YYYY.NN instead of YYYY.AA (e.g. 2014.4 instead of 2014d).

  18. Donald Stufft reporter

    Just for comparisons sake, here's the inverted numbers:

    $ invoke check.pep440 --cached
    Total Version Compatibility:              223643/238504 (93.77%)
    Total Sorting Compatibility (Unfiltered): 40175/45328 (88.63%)
    Total Sorting Compatibility (Filtered):   45318/45328 (99.98%)
    Projects with No Compatible Versions:     1994/45328 (4.40%)
    Projects with Differing Latest Version:   2481/45328 (5.47%)
    
  19. Nick Coghlan

    Regarding pytz: I knew they were considering switching to PEP 440 compatible version numbers, but I didn't realise they had already made the change. Given that, no need for special casing. I only brought it up because I went back and read your first post above :)

  20. Donald Stufft reporter

    So I decided instead of focusing on "easy" transform to make, to instead look at it from the viewpoint of relaxing the syntax (e.g. instead of a series of steps to take to transform the document, just allow more syntax variations).

    What I have so far (these are in order as some of the numbers depend on having previous steps already added), the numbers are (Change in "No Compatible", Change in "Differing Latest")

    1. (0.12%, 0.14%) Assume an implicit 0 if there is a dev without a trailing numeral
    2. (1.15%, 1.22%) Assume an implicit . if there is a devN without a preceding .
    3. (0.60%, 0.66%) Assume an implicit 0 if there is a a|b|c|rc without a trailing numeral
    4. (0.03%, 0.05%) Allow an extraneous . if there is a a|b|c|rc with a preceding .
    5. (0.10%, 0.08%) Ignore a preceding v if there is one (e.g. v1.0).

    This gets us to:

    $ invoke check.pep440 --cached
    Total Version Compatibility:              228843/238504 (95.95%)
    Total Sorting Compatibility (Unfiltered): 42200/45328 (93.10%)
    Total Sorting Compatibility (Filtered):   45300/45328 (99.94%)
    Projects with No Compatible Versions:     1090/45328 (2.40%)
    Projects with Differing Latest Version:   1506/45328 (3.32%)
    

    I think I can get more by allowing alpha and beta in addition to a and b, I'll be testing that next.

  21. Donald Stufft reporter

    So here's where I'm at now:

    $ invoke check.pep440 --cached
    Total Version Compatibility:              231249/238504 (96.96%)
    Total Sorting Compatibility (Unfiltered): 42987/45328 (94.84%)
    Total Sorting Compatibility (Filtered):   45279/45328 (99.89%)
    Projects with No Compatible Versions:     734/45328 (1.62%)
    Projects with Differing Latest Version:   1100/45328 (2.43%)
    

    This includes:

    • (0.03%, 0.03%) Case insensitive matching (so 1.0RC1 works)
    • (0.48%, 0.56%) Allow - in place of . as a separator for pre-releases (so 1.0-dev4 or 1.0-beta2 works)
    • (0.24%, 0.26%) Allow alpha and beta as alternate spellings of a and b
    • (0.03%, 0.04%) Assume an implicit leading 0 if there is a bare float (so .1 works)

    Looking at the current set of invalid versions, the biggest common patterns I can see are things like 1.0dev-r123, 1.0dev-123. It's kind of iffy but if we translate these to local versions so that 1.0dev-r123 becomes 1.0.dev0+r123 then we pick up an additional 0.42% and 0.47%. The danger in this one is that it's the only one that can't be implemented by adjusting the regex and it's the one that has the highest chance of returning data that is semantically different than what the original is supposed to mean. I'm kind of iffy on including it but I figured I'd mention it.

    With the local version change above we're at:

    $ invoke check.pep440 --cached
    Total Version Compatibility:              232982/238504 (97.68%)
    Total Sorting Compatibility (Unfiltered): 43530/45328 (96.03%)
    Total Sorting Compatibility (Filtered):   45248/45328 (99.82%)
    Projects with No Compatible Versions:     546/45328 (1.20%)
    Projects with Differing Latest Version:   890/45328 (1.96%)
    
  22. Donald Stufft reporter

    With all of the changes I've gotten here, I feel like we're now in the long tail where there are other versions we can pick up by relaxing the parser, but each new "relaxation" only adds a minor fraction of a percent to our numbers and adds even more complication to the parsing. I'm not seeing any other major patterns (all invalid versions can be found here: https://gist.githubusercontent.com/dstufft/9f352332e5e076d9d507/raw/26ea13e7727e188edcca36485bc8a0bbfb414ab1/invalid.json).

    Looking at these which rules do we want to implement? I'm thinking:

    • Do not implement the local version, it requires actual transformation instead of relaxing the syntax and has a decent danger of changing semantics
    • Do not implement the implicit leading 0, it's easy to do and danger is pretty low, but I find a version of .1 really ugly and it doesn't get us much.
    • Maybe not implement the ignoring of a preceding v such as in v1.0, again danger is low and easy to implement, but I find v1.0 ugly as well.
    • Implement everything else. Since these are syntax changes we can easily allow them in specifiers as well.

    A key change in this is that these should normalize to the preferred syntax, so something like 1.0-ALPHA will be normalized to 1.0a0.

    What do you think? Do you see any additional patterns? Do you agree with the above suggestions?

  23. Donald Stufft reporter

    Oh, and since the per item numbers can be skewed because some of them depend on other ones, here's the result if doing the suggestions I had above:

    $ invoke check.pep440 --cached
    Total Version Compatibility:              230880/238504 (96.80%)
    Total Sorting Compatibility (Unfiltered): 42923/45328 (94.69%)
    Total Sorting Compatibility (Filtered):   45304/45328 (99.95%)
    Projects with No Compatible Versions:     797/45328 (1.76%)
    Projects with Differing Latest Version:   1156/45328 (2.55%)
    
  24. Donald Stufft reporter

    Another thought, we could totally add a thing that says that you SHOULD accept anything in a == specifier, which will be matched exactly. This would provide an easy escape hatch so even if the version you want to install isn't PEP 440 compatible, you can still install it using pip install foo==my-invalid-version.

  25. Donald Stufft reporter

    Here are the final numbers for my suggestion, implemented by relaxing the regex and normalizing instead of just doing text transforms on the version before passing it into the regex (I did it that way for simplification on the first pass). They are basically the same as above with a few minor variations:

    $ invoke check.pep440 --cached
    Total Version Compatibility:              230880/238504 (96.80%)
    Total Sorting Compatibility (Unfiltered): 42927/45328 (94.70%)
    Total Sorting Compatibility (Filtered):   45307/45328 (99.95%)
    Projects with No Compatible Versions:     797/45328 (1.76%)
    Projects with Differing Latest Version:   1157/45328 (2.55%)
    
  26. Donald Stufft reporter

    So another question we need to answer is, should we relax the restrictions for post releases as well? I do not see anything that would be allowed by relaxing them, however it would make the rules more consistent and probably be less surprising? This would allow things like 1.0-post4, 1.0post6, and 1.0.post.

  27. Nick Coghlan

    Yeah, the consistency option sounds good. I also like your suggestions for which normalisations to apply.

    I haven't looked at your implementation yet - did you keep both regexes (strict vs relaxed), so it was easy to check if a version number had already been normalised?

  28. Donald Stufft reporter

    No I didn't, I just adjusted the regex to accept them all and it normalizes naturally when it gets reconstructed. You can test if something was normalized already by doing version == str(Version(version)). It should return the same thing unless it did some normalization.

  29. Nick Coghlan

    Fair enough. From a clarity perspective, we should keep both in the PEP.

    Another option we should consider: keep the client strict, add normalised aliases on the PyPI simple API. However, whether or not that is viable will depend on how pkg_resources handles it.

  30. Donald Stufft reporter

    It's preferable not to make the simple API "smart" because that smartness needs to be replicated to anyone using a non "strict" PEP 440 version on a non PyPI. It'll also need to be reflected for installed versions and the like as well.

  31. Donald Stufft reporter

    I'm totally OK with putting both regexes in the PEP too. There's actually not any regex in the PEP at all right now, and my PR takes the route of leaving the psuedo grammar define the "strict" version, and then define a list of additional syntaxes and what their "normal form" is.

  32. Log in to comment