Incorrect handling of tab character

Issue #404 wontfix
Sylvain Baudoin
created an issue

Hello,

As per http://yaml.org/spec/1.1/#tab/, the tab character should not be accepted in plain scalar values, but snakeyaml does not complain in such a situation. It seems to only forbid tabs as indentation character.

Example: the following code should raise a syntax error exception but it does not

        String buffer = "data: should\tbreak but does not";
        ScannerImpl scanner = new ScannerImpl(new StreamReader(buffer));
        while (scanner.checkToken()) {
            scanner.getToken();
        }

Conversely if tab is used to indent, a syntax error is raised on purpose:

        String buffer = "\twill: fail";
        ScannerImpl scanner = new ScannerImpl(new StreamReader(buffer));
        while (scanner.checkToken()) {
            scanner.getToken();
        }

Comments (12)

  1. Sylvain Baudoin reporter

    Hi,

    Yes, we can discuss that outside of snakeyaml. Honestly, I'm surprised that the YAML spec says that tabs are forbidden in scalar values because the context seems clear to me (in my example at least) that the tab is part of the scalar value and is not there for indentation purpose.

    From a couple of other parsers I've tested (PyYAML and YAMLBeans), there is a deviation: both parsers raise an exception with the YAML string I gave as an example; only snakeyaml does not crash.

  2. Andrey Somov repo owner
    • changed status to open

    Well, by discussing outside of SnakeYAML I meant to discuss it with other teams.

    But it looks like SnakeYAML has a bug. At least a deviation from PyYAML. I remember that tabs caused some troubles in the past, but I do not remember the whole context.

    I will check it.

  3. Sylvain Baudoin reporter

    I do the following:

    print yaml.load('data: should\t3')
    print yaml.load('data: should   3')
    

    For every test I get a long stack trace like the following:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib64/python2.7/site-packages/yaml/__init__.py", line 71, in load
        return loader.get_single_data()
      File "/usr/lib64/python2.7/site-packages/yaml/constructor.py", line 37, in get_single_data
        node = self.get_single_node()
      File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 36, in get_single_node
        document = self.compose_document()
      File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 55, in compose_document
        node = self.compose_node(None, None)
      File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 84, in compose_node
        node = self.compose_mapping_node(anchor)
      File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 127, in compose_mapping_node
        while not self.check_event(MappingEndEvent):
      File "/usr/lib64/python2.7/site-packages/yaml/parser.py", line 98, in check_event
        self.current_event = self.state()
      File "/usr/lib64/python2.7/site-packages/yaml/parser.py", line 428, in parse_block_mapping_key
        if self.check_token(KeyToken):
      File "/usr/lib64/python2.7/site-packages/yaml/scanner.py", line 116, in check_token
        self.fetch_more_tokens()
      File "/usr/lib64/python2.7/site-packages/yaml/scanner.py", line 257, in fetch_more_tokens
        % ch.encode('utf-8'), self.get_mark())
    yaml.scanner.ScannerError: while scanning for the next token
    found character '\t' that cannot start any token
      in "<string>", line 1, column 13:
        data: should        3
                    ^
    

    I do the test with Python 2.7.5 and PyYAML 3.12.

  4. Andrey Somov repo owner

    It looks like PyYAML is disappearing:

    $ pip search yaml|grep pyyaml
    simple-yaml (0.1.0)                                    - A simple version of pyyaml
    pyyamlconfig (0.2.3)                                   - Load configuration file in yaml format
    

    ruamel has become the standard YAML package for Python.

    Ruamel does accept the tab inside a plain scalar.

    Does it mean that the issue is solved ?

    @Sylvain Baudoin: feel free to contact me via e-mail. I have a few questions and proposals.

  5. Sylvain Baudoin reporter

    Hello,

    I don’t know if replying to a notification is the right way to contact you. My personal opinion about this issue, as already stated, is that tabs, as long as there is ambiguity about the context, should be supported as SnakeYAML currently does.

    By the way, how is that PyYAML is no longer leading the way of YAML for Python ?

    Regards,

    Sylvain

    De : Andrey Somov Envoyé le :dimanche 22 avril 2018 12:46 À : sylvain.baudoin@gmail.com Objet :Re: [Bitbucket] Issue #404: Incorrect handling of tab character(asomov/snakeyaml)

    Andrey Somov commented on issue #404: Incorrect handling of tab character It looks like PyYAML is disappearing: $ pip search yaml|grep pyyaml simple-yaml (0.1.0) - A simple version of pyyaml pyyamlconfig (0.2.3) - Load configuration file in yaml format ruamel has become the standard YAML package for Python. Ruamel does accept the tab inside a plain scalar. Does it mean that the issue is solved ? @Sylvain Martinez Baudoin: feel free to contact me via e-mail. I have a few questions and proposals.

    View this issue or add a comment by replying to this email.

    Unwatch this issue to stop receiving email updates.

  6. Log in to comment