Alias names are too permissive compared to libyaml and future spec

Issue #485 resolved
Charles Nutter created an issue

Recently JRuby had a bug report that showed our SnakeYAML-based YAML library raised a different error for a peculiar alias compared to CRuby’s libyaml-based implementation.

https://github.com/jruby/jruby/issues/6365

The YAML in question:

Exclude: **/*_old.rb

SnakeYAML allows this peculiar alias name to parse, but then raises an alias error because the alias does not exist. libyaml rejects this as a syntax error, because they only allow alphanumeric, dash, and underscore characters.

Because this is a spec question, I raised an issue with libyaml to clarify behavior.

https://github.com/yaml/libyaml/issues/205

The official answer I got back was that yes, the spec allows this to be considered valid, but it is too permissive and allows several characters that have no real value. The maintainers of libyaml acknowledge that it is more restrictive than perhaps it should be, but they do not want to change it to match the too-permissive spec.

The future behavior will apparently follow from YAML RFC-0003, which is more restrictive than YAML 1.2, but more permissive than current libyaml.

https://github.com/yaml/yaml-spec/blob/master/rfc/RFC-0003.md

Given this background and future, I think it’s safe to say that SnakeYAML should be modified to only allow aliases permitted by RFC-0003, which would bring it closer to alignment with libyaml and resolve the JRuby YAML question.

Note: The original report only came up because they expect an error for this case, but we raise a different error. I think they just wanted to test that it would be rejected since it’s probably a user mistake (i.e. it probably should have been a quoted string).

Comments (9)

  1. Charles Nutter reporter

    That would be acceptable, and it seems to be the direction libyaml will be going. It would still be more permissive than current libyaml, but I now have a good paper trail to explain why.

  2. Charles Nutter reporter

    This looks good to me!

    $ ruby -ryaml -e "YAML.load('Exclude: **/*_old.rb')"
    Psych::SyntaxError: (<unknown>): unexpected character found *(42) while scanning an alias at line 1 column 11
             parse at org/jruby/ext/psych/PsychParser.java:246
      parse_stream at /Users/headius/projects/jruby/lib/ruby/stdlib/psych.rb:454
             parse at /Users/headius/projects/jruby/lib/ruby/stdlib/psych.rb:388
              load at /Users/headius/projects/jruby/lib/ruby/stdlib/psych.rb:277
            <main> at -e:1
    

  3. Andrey Somov

    @Charles Nutter It will be released in v1.28 which is scheduled for February 2021 (we release twice a year). Do you need to release SnakeYMAL earlier ?

  4. Charles Nutter reporter

    @Andrey Somov I guess we just missed the last release, eh? February is indeed a long time to wait. This issue was reported in the context of a test suite for a popular tool; as far as I know it does not actually affect the tool, but it will prevent them from testing on JRuby without special exclusions. The YAML in question does throw an error, but it’s the wrong error. I’m not sure how to weigh that against asking you for a special release.

    I guess I’d say that yes, we would appreciate a mid-term release so we can resolve this issue. It’s not critical, but it helps us remain compliant with expected Ruby YAML behavior. And it’s possible it might affect an end user, though we have no such reports today.

  5. Log in to comment