Issue #29 new

Zero padded numbers ending in 8 or 9 dump incorrectly (or everything else does!)

jtor14
created an issue

When dumping zero padded numbers, yaml.dump quotes the strings:

yaml.dump(['01', '02', '0043'])
# "['01', '02', '0043']\n"

However, if your zero padded number ends an 8 or 9, the quotes are dropped:

yaml.dump(['08', '029', '0002418'])
# '[08, 029, 0002418]\n'

Yet, the string version of 8 and 9 behave nicely:

yaml.dump(['8', '9'])
# "['8', '9']\n"

As do strings of numbers ending in 8 or 9:

yaml.dump(['349', '2308'])
# "['349', '2308']\n"

Comments (4)

  1. Florent Xicluna

    I am afraid that it is by design.

    • example 1, the representation without quotes is an octal (starting with 0 and all digits < 8)
    • example 3 and 4, the representation without quotes is a decimal (starting with non 0 and digits only)
    • example 2 can be represented without quotes because it cannot be confused with a decimal or an octal: starts with 0 and contains digit 8 or 9

    I had a similar annoyance in a project, I fixed it by subclassing yaml.Dumper with custom representers (it fixes two other annoyances also).

    import yaml
    
    
    def represent_unicode(dumper, data, style=None):
        data = unicode(data)
        if not dumper.default_style and data.isdigit():
            style = "'"
        return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style=style)
    
    
    class UDumper(yaml.Dumper):
        pass
    
    # Represent longs (42L) the same as ints
    UDumper.add_representer(long, yaml.Dumper.yaml_representers[int])
    # Always use quotes when the string contains only digits
    UDumper.add_representer(str, represent_unicode)
    # Represent <unicode> the same as <str>
    UDumper.add_representer(unicode, represent_unicode)
    
    
    # Usage:
    # yaml.dump(['01', '02', '0043'], Dumper=UDumper)
    # yaml.dump(['08', '029', '0002418'], Dumper=UDumper)
    
  2. jtor14 reporter

    I understand the need for octal representation, however, I'm specifically interested in the string representation of those numbers. It is the apparent coercion to some type of number even though it is quoted that seems incorrect.

    It seems to me that the yaml.dump (and representer) should remain ignorant of the notion that the element contained within the quotes can be coerced into a number (of any kind).

  3. Jayson Vantuyl

    My team just ran into this exact problem. In our case, it was an eight-digit "user ID" that was a hexadecimal number. It ended up having the unfortunate luck to be both prepended with a zero and have no letters or numbers greater than 7. The odds of this are approximately 1 in 2000, so it's just subtle enough that I bet it's probably wreaking havoc in all manner of places that we haven't noticed.

    YAML 1.2 addresses this and explicitly chooses not to do this with their regexp. I'd consider this to be an implementation bug and fix it to match the 1.2 regexp. While zero-prefixed numbers have a tradition in Unix / C of indicating octalness, I think that it's pretty reasonable to suggest that this is both of limited utility and rapidly declining familiarity.

    More formally, the rationale behind me recommending this change is:

    1. It's undocumented and people should be relying on it.
    2. Being incompatible with YAML 1.2 is not a forward-looking choice.
    3. It causes some seriously unexpected behavior.
    4. I bet we're highly unlike to find anybody seriously impacted negatively by this.

    I'll be submitting a pull request shortly.

  4. Jayson Vantuyl

    Hmmm, well, looks like I may be lying. It looks seems to be documented, albeit poorly. This appears to have been a change from 1.1 to 1.2. I'm not sure what the right thing is here. It's worth noting that quotes won't save you here (because "heuristics"). This is all fairly non-obvious. I'd go so far as suggesting that it's a "pitfall".

    That said, this is being used. After doubting my earlier statement, I did a search and got this: https://github.com/search?q=0644+language%3AYAML&type=Code&ref=searchresults

    Tons of people using this for Unix permissions. :/

    As I see it, our options are:

    1. Break YAML 1.1 in a way that leaves the octal-loving Unixers out in the cold.
    2. Provide a "quirks" interface to tell the parser what you want.
    3. Offer an API to select the YAML version which effectively does #2 under the covers.

    I'm not really clear what the best option is here, but it looks non-trivial.

  5. Log in to comment