unquoted string "0x_" cause NumberFormatException

Issue #506 resolved
Michael Ziwisky created an issue

e.g. try parsing string_that_almost_looks_like_hex: 0x_

Comments (13)

  1. Michael Ziwisky reporter

    well, i just realized that the ruby and python parsers also blow up on this test case, so perhaps it isn’t unreasonable for snakeyaml to do the same. also found the spec shows the regex to use for integers, and that seems to match what’s on master.

    that said, i still think treating 0x_ as a string is a sensible decision. the spec is somewhat ambiguous about how it should be treated. but whatever y’all feel like is fine with me – i can work around it in my app.

  2. Andrey Somov

    We spent too much time trying to fix deviations of the parsers. I am not in favour to add yet another one.

    By the way, you can specify your own Resolver at runtime to achieve the goal. Than you do not have to use quotes.

  3. Michael Ziwisky reporter

    yep, we’re currently working around it with a custom constructor, actually. anyway, it isn’t a goal of mine to be able to not use quotes in the yaml i produce, this is a matter of parsing user-provided yaml, i.e. that i haven’t constructed. since the spec is ambiguous, i guess it’s up to us to decide what 0x_ means (or 0b_ or 0_ – other cases that i’m just realizing also fail).

    the solution in my PR is a bad one – it causes 0x_0A_74_AE to be parsed as a string, but this is a case that is specifically called out as an example of a hexadecimal int in http://yaml.org/type/int.html. sounds like you don’t want to update the resolver, but nonetheless i’ll update that PR with an even more complicated regex to handle this issue, just so there’s a record of it for posterity.

  4. Michael Ziwisky reporter

    k, updated the PR on github to handle stuff like 0x_0A_74_AE as a number, and to handle stuff like 0b_ and 0_ and -_ as a string.

  5. Michael Ziwisky reporter

    you probably saw my comment on https://bitbucket.org/asomov/snakeyaml/issues/449, but I think that’s debatable. for what it’s worth, ruby 2.6 and 2.7 consider it a string, as does python 3.9.5. if it is a number, what base is it, and what is the resulting decimal value?

    also, i know you said early on you don’t want to fix parser deviations, and i respect that decision. if you don’t want handling for these problematic strings to be merged in, i’ll drop the issue. if you do think we’re on track and just need a little more tweaking, i’m happy to stick around and work out the details with you, or hand it off for you to finish up, whatever you prefer.

  6. Andrey Somov

    I am not sure about Python 3.9, but Python 3.6 which I have raises an exception:

    ValueError: invalid literal for int() with base 16: ''

  7. Michael Ziwisky reporter

    are you talking about for 0x_? that’s what 3.9 does for that input, but for 0123456789 it loads it as a string.

  8. Andrey Somov

    I appreciate your clarification and contribution.

    I think it should be taken. The only question I have is that the tests fail now.

    8e-06 parsed as String instead of Double.

  9. Michael Ziwisky reporter

    sorry, this fell off my radar for a bit, but thanks for accepting the fix and taking care of the affected tests!

  10. Log in to comment