Clarifying differences from libyaml

Issue #498 open
Charles Nutter created an issue

Hey it’s me again!

I am trying to get the tests for Psych, the Ruby extension wrapping libyaml, to pass on JRuby’s version that wraps SnakeYAML. There are a few differences I wanted to bring up here in case there’s a reason for them or an easy workaround. The author of Psych is also open to making the tests less strict, but I want to understand these behavior differences first.

Some of these might be differences in the Psych code, but I think they are all actually differences in the YAML engines.

  1. YAML version appears in the list of tag directives in SnakeYAML, but not in libyaml

The failure looks like this:

Failure:
Psych::TestTreeBuilder#test_documents [/Users/headius/projects/psych/test/psych/test_tree_builder.rb:33]:
--- expected
+++ actual
@@ -1 +1 @@
-[]
+[["!!", "tag:yaml.org,2002:"], ["!", "!"]]

And the code for this test:

    def setup
      super
      @parser = Psych::Parser.new TreeBuilder.new
      @parser.parse(<<-eoyml)
%YAML 1.1
---
- foo
- {
  bar : &A !!str baz,
  boo : *A
}
- *A
      eoyml
      @tree = @parser.handler.root
    end

    def test_documents
      assert_equal 1, @tree.children.length
      assert_instance_of Nodes::Document, @tree.children.first
      doc = @tree.children.first

      assert_equal [1,1], doc.version
      assert_equal [], doc.tag_directives
      assert_equal false, doc.implicit
      assert_location 0, 0, 8, 0, doc
    end

Is there a reason SnakeYAML returns the YAML version directive as a tag directive here?

There is a similar failure with slightly different source:

Failure:
Psych::Handlers::TestRecorder#test_replay [/Users/headius/projects/psych/test/psych/handlers/test_recorder.rb:22]:
--- expected
+++ actual
@@ -1,3 +1,5 @@
-"--- foo
+"%TAG ! !
+%TAG !! tag:yaml.org,2002:
+--- foo
 ...
 "
      def test_replay
        yaml   = "--- foo\n...\n"
        output = StringIO.new

        recorder = Psych::Handlers::Recorder.new
        parser   = Psych::Parser.new recorder
        parser.parse yaml

        assert_equal 5, recorder.events.length

        emitter = Psych::Emitter.new output
        recorder.events.each do |m, args|
          emitter.send m, *args
        end
        assert_equal yaml, output.string
      end

2. Non-URI characters in tag names

libyaml appears to be more liberal here:

Error:
Psych::TestEncoding#test_start_mapping:
Psych::SyntaxError: (<unknown>): expected URI, but found (12496) while scanning a tag at line 1 column 8
    org/jruby/ext/psych/PsychParser.java:257:in `parse'
    org/jruby/ext/psych/PsychParser.java:115:in `parse'
    /Users/headius/projects/psych/test/psych/test_encoding.rb:155:in `test_start_mapping'
    def test_start_mapping
      foo = 'foo'
      bar = 'バー'

      @emitter.start_stream Psych::Parser::UTF8
      @emitter.start_document [], [], true
      @emitter.start_mapping(
        foo.encode('Shift_JIS'),
        bar.encode('UTF-16LE'),
        false, Nodes::Sequence::ANY)
      @emitter.end_mapping
      @emitter.end_document false
      @emitter.end_stream

      @parser.parse @buffer.string
      assert_encodings @utf8, @handler.strings
      assert_equal [foo, bar], @handler.strings
    end

3. Line numbering differences

SnakeYAML and libyaml differ in some of the positions of the YAML elements:

Failure:
Psych::TestParser#test_event_location [/Users/headius/projects/psych/test/psych/test_parser.rb:362]:
--- expected
+++ actual
@@ -1 +1 @@
-[[:start_stream, [0, 0, 0, 0]], [:start_document, [0, 0, 0, 0]], [:start_mapping, [0, 0, 0, 0]], [:scalar, [0, 0, 0, 3]], [:start_mapping, [1, 2, 1, 2]], [:scalar, [1, 2, 1, 8]], [:start_sequence, [1, 10, 1, 11]], [:scalar, [1, 11, 1, 12]], [:scalar, [1, 14, 1, 15]], [:end_sequence, [1, 15, 1, 16]], [:end_mapping, [2, 0, 2, 0]], [:end_mapping, [2, 0, 2, 0]], [:end_document, [2, 0, 2, 0]], [:end_stream, [2, 0, 2, 0]]]
+[[:start_stream, [0, 0, 0, 0]], [:start_document, [0, 0, 0, 0]], [:start_mapping, [0, 0, 0, 0]], [:scalar, [0, 0, 0, 3]], [:start_mapping, [1, 2, 1, 2]], [:scalar, [1, 2, 1, 8]], [:start_sequence, [1, 10, 1, 11]], [:scalar, [1, 11, 1, 12]], [:scalar, [1, 14, 1, 15]], [:end_sequence, [1, 15, 1, 16]], [:end_mapping, [1, 16, 1, 16]], [:end_mapping, [1, 16, 1, 16]], [:end_document, [1, 16, 1, 16]], [:end_stream, [1, 16, 1, 16]]]

I can provide code but probably any multi-line YAML will show this effect.

There are many other failures but I am still sorting out which ones are unique. The yaml-version-in-tags seems to be the cause of several failures, as is the odd position information. My reading of the spec seems to indicate that SnakeYAML may be correct in rejecting unescaped non-ASCII tags but this would need clarification with the spec managers.

Comments (4)

  1. Charles Nutter reporter

    This is another minor one that seems likely to be a subtle YAML engine difference (or is it more likely to be a json difference?):

    Failure:
    Psych::JSON::TestStream#test_datetime [/Users/headius/projects/psych/test/psych/json/test_stream.rb:106]:
    Expected /\{"a":\ "2010\-10\-10\ 00:00:00\.000000000\ \-05:00"\}\n/ to match "--- {\"a\": \"2010-10-10 00:00:00.000000000 -05:00\"}".
    

    Note the missing newline in the string produced by the SnakeYAML dumper.

          def setup
            @io     = StringIO.new
            @stream = Psych::JSON::Stream.new(@io)
            @stream.start
          end
    
          def test_datetime
            time = Time.new(2010, 10, 10).to_datetime
            @stream.push({'a' => time })
            json = @io.string
            assert_match "{\"a\": \"#{time.strftime("%Y-%m-%d %H:%M:%S.%9N %:z")}\"}\n", json
          end
    

  2. Charles Nutter reporter

    It is possible that the URI error is something I am not doing in the Psych extension that wraps SnakeYAML but I am trying to make that determination now. If I try to parse the given YAML directly in CRuby (w/ libyaml) it also kicks it out… so something is different when testing.

  3. Andrey Somov

    Dear Charles, I did my best to solve any issue. I am lost. Not enough data to make any test.
    Can you please create separate tickets for each issue ? (with enough info to make a test to reproduce a failure or deviation from the expectation)

  4. Log in to comment