Clarifying differences from libyaml
Hey it’s me again!
I am trying to get the tests for Psych, the Ruby extension wrapping libyaml, to pass on JRuby’s version that wraps SnakeYAML. There are a few differences I wanted to bring up here in case there’s a reason for them or an easy workaround. The author of Psych is also open to making the tests less strict, but I want to understand these behavior differences first.
Some of these might be differences in the Psych code, but I think they are all actually differences in the YAML engines.
- YAML version appears in the list of tag directives in SnakeYAML, but not in libyaml
The failure looks like this:
Failure:
Psych::TestTreeBuilder#test_documents [/Users/headius/projects/psych/test/psych/test_tree_builder.rb:33]:
--- expected
+++ actual
@@ -1 +1 @@
-[]
+[["!!", "tag:yaml.org,2002:"], ["!", "!"]]
And the code for this test:
def setup
super
@parser = Psych::Parser.new TreeBuilder.new
@parser.parse(<<-eoyml)
%YAML 1.1
---
- foo
- {
bar : &A !!str baz,
boo : *A
}
- *A
eoyml
@tree = @parser.handler.root
end
def test_documents
assert_equal 1, @tree.children.length
assert_instance_of Nodes::Document, @tree.children.first
doc = @tree.children.first
assert_equal [1,1], doc.version
assert_equal [], doc.tag_directives
assert_equal false, doc.implicit
assert_location 0, 0, 8, 0, doc
end
Is there a reason SnakeYAML returns the YAML version directive as a tag directive here?
There is a similar failure with slightly different source:
Failure:
Psych::Handlers::TestRecorder#test_replay [/Users/headius/projects/psych/test/psych/handlers/test_recorder.rb:22]:
--- expected
+++ actual
@@ -1,3 +1,5 @@
-"--- foo
+"%TAG ! !
+%TAG !! tag:yaml.org,2002:
+--- foo
...
"
def test_replay
yaml = "--- foo\n...\n"
output = StringIO.new
recorder = Psych::Handlers::Recorder.new
parser = Psych::Parser.new recorder
parser.parse yaml
assert_equal 5, recorder.events.length
emitter = Psych::Emitter.new output
recorder.events.each do |m, args|
emitter.send m, *args
end
assert_equal yaml, output.string
end
2. Non-URI characters in tag names
libyaml appears to be more liberal here:
Error:
Psych::TestEncoding#test_start_mapping:
Psych::SyntaxError: (<unknown>): expected URI, but found バ(12496) while scanning a tag at line 1 column 8
org/jruby/ext/psych/PsychParser.java:257:in `parse'
org/jruby/ext/psych/PsychParser.java:115:in `parse'
/Users/headius/projects/psych/test/psych/test_encoding.rb:155:in `test_start_mapping'
def test_start_mapping
foo = 'foo'
bar = 'バー'
@emitter.start_stream Psych::Parser::UTF8
@emitter.start_document [], [], true
@emitter.start_mapping(
foo.encode('Shift_JIS'),
bar.encode('UTF-16LE'),
false, Nodes::Sequence::ANY)
@emitter.end_mapping
@emitter.end_document false
@emitter.end_stream
@parser.parse @buffer.string
assert_encodings @utf8, @handler.strings
assert_equal [foo, bar], @handler.strings
end
3. Line numbering differences
SnakeYAML and libyaml differ in some of the positions of the YAML elements:
Failure:
Psych::TestParser#test_event_location [/Users/headius/projects/psych/test/psych/test_parser.rb:362]:
--- expected
+++ actual
@@ -1 +1 @@
-[[:start_stream, [0, 0, 0, 0]], [:start_document, [0, 0, 0, 0]], [:start_mapping, [0, 0, 0, 0]], [:scalar, [0, 0, 0, 3]], [:start_mapping, [1, 2, 1, 2]], [:scalar, [1, 2, 1, 8]], [:start_sequence, [1, 10, 1, 11]], [:scalar, [1, 11, 1, 12]], [:scalar, [1, 14, 1, 15]], [:end_sequence, [1, 15, 1, 16]], [:end_mapping, [2, 0, 2, 0]], [:end_mapping, [2, 0, 2, 0]], [:end_document, [2, 0, 2, 0]], [:end_stream, [2, 0, 2, 0]]]
+[[:start_stream, [0, 0, 0, 0]], [:start_document, [0, 0, 0, 0]], [:start_mapping, [0, 0, 0, 0]], [:scalar, [0, 0, 0, 3]], [:start_mapping, [1, 2, 1, 2]], [:scalar, [1, 2, 1, 8]], [:start_sequence, [1, 10, 1, 11]], [:scalar, [1, 11, 1, 12]], [:scalar, [1, 14, 1, 15]], [:end_sequence, [1, 15, 1, 16]], [:end_mapping, [1, 16, 1, 16]], [:end_mapping, [1, 16, 1, 16]], [:end_document, [1, 16, 1, 16]], [:end_stream, [1, 16, 1, 16]]]
I can provide code but probably any multi-line YAML will show this effect.
…
There are many other failures but I am still sorting out which ones are unique. The yaml-version-in-tags seems to be the cause of several failures, as is the odd position information. My reading of the spec seems to indicate that SnakeYAML may be correct in rejecting unescaped non-ASCII tags but this would need clarification with the spec managers.
Comments (4)
-
reporter -
reporter It is possible that the URI error is something I am not doing in the Psych extension that wraps SnakeYAML but I am trying to make that determination now. If I try to parse the given YAML directly in CRuby (w/ libyaml) it also kicks it out… so something is different when testing.
-
Sorry, I do not understand your last message at all.
-
Dear Charles, I did my best to solve any issue. I am lost. Not enough data to make any test.
Can you please create separate tickets for each issue ? (with enough info to make a test to reproduce a failure or deviation from the expectation) - Log in to comment
This is another minor one that seems likely to be a subtle YAML engine difference (or is it more likely to be a json difference?):
Note the missing newline in the string produced by the SnakeYAML dumper.