Document size limit should be applied to single document not the whole input stream

Issue from Mailing List

While the SnakeYAML issues were down due to spam problems, Petr Gladkikh raised this issue on the SnakeYAML
mailing list:

Hello,

// As a side note, the links in the project README to Slack and the Jira did not work for me.

Link to Slack leads to an empty space, and bug-tracker link says "We can't let you see this page". Probably these should be updated. Google groups link is the only one that I could use.

After recent changes (between 1.26 and 1.33). Loading long streams of documents hits document size limit. It is unusual to have very long or even indefinite input streams. As I can tell Yaml.loadAll method parses documents lazily and they are not accumulated inside of the parser, so limit on the whole input stream size does not make sense technically. Besides, the error message mentions "YAML document" which implies it is per document not per stream limit.

So I believe the limit should be applied to single document and be reset very time a complete document is loaded.

An example of the stack trace where the limit is hit (Snakeyaml v1.33) is below. In my case each document in the input file is only few kilobytes in length (the whole input file is about 12Mb), so there should be no problem reading as many of them as necessary.

Thanks.

The incoming YAML document exceeds the limit: 3145728 code points.
at org.yaml.snakeyaml.scanner.ScannerImpl.fetchMoreTokens(ScannerImpl.java:342)
at org.yaml.snakeyaml.scanner.ScannerImpl.checkToken(ScannerImpl.java:263)
at org.yaml.snakeyaml.parser.ParserImpl$ParseBlockMappingKey.produce(ParserImpl.java:662)
at org.yaml.snakeyaml.parser.ParserImpl.peekEvent(ParserImpl.java:185)
at org.yaml.snakeyaml.comments.CommentEventsCollector$1.peek(CommentEventsCollector.java:57)
at org.yaml.snakeyaml.comments.CommentEventsCollector$1.peek(CommentEventsCollector.java:43)
at org.yaml.snakeyaml.comments.CommentEventsCollector.collectEvents(CommentEventsCollector.java:136)
at org.yaml.snakeyaml.comments.CommentEventsCollector.collectEvents(CommentEventsCollector.java:116)
at org.yaml.snakeyaml.composer.Composer.composeScalarNode(Composer.java:239)
at org.yaml.snakeyaml.composer.Composer.composeNode(Composer.java:208)
at org.yaml.snakeyaml.composer.Composer.composeValueNode(Composer.java:357)
at org.yaml.snakeyaml.composer.Composer.composeMappingChildren(Composer.java:336)
at org.yaml.snakeyaml.composer.Composer.composeMappingNode(Composer.java:311)
at org.yaml.snakeyaml.composer.Composer.composeNode(Composer.java:212)
at org.yaml.snakeyaml.composer.Composer.composeValueNode(Composer.java:357)
at org.yaml.snakeyaml.composer.Composer.composeMappingChildren(Composer.java:336)
at org.yaml.snakeyaml.composer.Composer.composeMappingNode(Composer.java:311)
at org.yaml.snakeyaml.composer.Composer.composeNode(Composer.java:212)
at org.yaml.snakeyaml.composer.Composer.getNode(Composer.java:134)
at org.yaml.snakeyaml.constructor.BaseConstructor.getData(BaseConstructor.java:168)
at org.yaml.snakeyaml.Yaml$1.next(Yaml.java:499)

I asked on the mailing list if an issue was raised for this one, and @Andrey Somov responded that he had fixed it. Since issues are now working again, he suggested that I raise an issue for it here.

Reproduction under SnakeYAML 2.0

I wrote a small program to illustrate the problem:

package org.example;

import org.yaml.snakeyaml.LoaderOptions;
import org.yaml.snakeyaml.Yaml;
import java.util.Iterator;

public class Main {
    private static void dumpAllDocs(String input, long codePointLimit) {
        System.out.println ("Loading all docs with codePointLimit of "+ codePointLimit);
        LoaderOptions loaderOpts = new LoaderOptions();
        loaderOpts.setCodePointLimit((int) codePointLimit);
        Yaml yaml = new Yaml(loaderOpts);

        Iterator<Object> docs = yaml.loadAll(input).iterator();

        for (int ndx = 1; ndx <= 3; ndx++) {
            try {
                Object doc = docs.next();
                System.out.println("doc " + ndx + " loaded: " + doc);
            } catch (Exception e) {
                System.out.println("doc " + ndx + " failed: " + e.getMessage());
                return;
            }
        }
    }

    public static void main(String[] args) {
        String doc1 = "document: this is document one\n";
        String doc2 = "document: this is document 2\n";
        String doc3 = "document: this is document three\n";
        String input = doc1 + "---\n" + doc2 + "---\n" + doc3;

        System.out.println ("doc1 size: " + doc1.codePoints().count());
        System.out.println ("doc2 size: " + doc2.codePoints().count());
        System.out.println ("doc3 size: " + doc3.codePoints().count());
        System.out.println ("input size:" + input.codePoints().count());

        System.out.println ("\nTest1. All should load, all docs are less than total input size.");
        dumpAllDocs(input, input.codePoints().count());

        System.out.println ("\nTest2. All should load, all docs are less than total input size - 1.");
        dumpAllDocs(input, input.codePoints().count() -1);

        System.out.println ("\nTest3. All should load, all docs are less or equal to doc3 size.");
        dumpAllDocs(input, doc3.codePoints().count());

        System.out.println ("\nTest4. Should fail to load at 3rd doc, it is longer than doc3 size -1.");
        dumpAllDocs(input, doc3.codePoints().count() - 1);
    }
}

When the code is run against SnakeYAML 2.0, it outputs:

doc1 size: 31
doc2 size: 29
doc3 size: 33
input size:101

Test1. All should load, all docs are less than total input size.
Loading all docs with codePointLimit of 101
doc 1 loaded: {document=this is document one}
doc 2 loaded: {document=this is document 2}
doc 3 loaded: {document=this is document three}

Test2. All should load, all docs are less than total input size - 1.
Loading all docs with codePointLimit of 100
doc 1 loaded: {document=this is document one}
doc 2 loaded: {document=this is document 2}
doc 3 failed: The incoming YAML document exceeds the limit: 100 code points.

Test3. All should load, all docs are less or equal to doc3 size.
Loading all docs with codePointLimit of 33
doc 1 loaded: {document=this is document one}
doc 2 failed: The incoming YAML document exceeds the limit: 33 code points.

Test4. Should fail to load at 3rd doc, it is longer than doc3 size -1.
Loading all docs with codePointLimit of 32
doc 1 loaded: {document=this is document one}
doc 2 failed: The incoming YAML document exceeds the limit: 32 code points.

Process finished with exit code 0

This reflects Petr’s report. Only Test1 delivered the expected results.

Retrying under SnakeYAML 2.1 (unreleased in GIT)

I did a local install of SnakeYAML 2.1 as of e46ff5a and reran to get the following output:

doc1 size: 31
doc2 size: 29
doc3 size: 33
input size:101

Test1. All should load, all docs are less than total input size.
Loading all docs with codePointLimit of 101
doc 1 loaded: {document=this is document one}
doc 2 loaded: {document=this is document 2}
doc 3 loaded: {document=this is document three}

Test2. All should load, all docs are less than total input size - 1.
Loading all docs with codePointLimit of 100
doc 1 loaded: {document=this is document one}
doc 2 loaded: {document=this is document 2}
doc 3 loaded: {document=this is document three}

Test3. All should load, all docs are less or equal to doc3 size.
Loading all docs with codePointLimit of 33
doc 1 loaded: {document=this is document one}
doc 2 loaded: {document=this is document 2}
doc 3 failed: The incoming YAML document exceeds the limit: 33 code points.

Test4. Should fail to load at 3rd doc, it is longer than doc3 size -1.
Loading all docs with codePointLimit of 32
doc 1 loaded: {document=this is document one}
doc 2 loaded: {document=this is document 2}
doc 3 failed: The incoming YAML document exceeds the limit: 32 code points.

Only Test3 seems off to me. Should it have successfully loaded doc3 as well?

Or maybe my Test is a bit off?

‌

Issue from Mailing List

Reproduction under SnakeYAML 2.0

Retrying under SnakeYAML 2.1 (unreleased in GIT)

Comments (11)