YAML parser is not thread-safe

Issue #658 resolved
Alex Spurling created an issue

I have encountered an issue when parsing multiple yaml files in parallel. The error results in a failure to strip double quotes from some values. For example, the file:

foo:
    "key1": "value1"
    "key2": "value2"
    "key3": "value3"

Might be parsed as if the input file was:

foo:
    "key1": "value1"
    "key2": "\"value2\""
    "key3": "value3"

I have traced the error down to the following method in the class net.sf.okapi.filters.yaml.parser.Line:

    public static String decode(String encoded) {
        String decoded = encoded;
        try {
            decoded = (String) yaml.load(encoded);
        } catch(Exception e) {
            // case where snakeyaml blows up on surrogates and other "non-printables"
            // just pass through the uncide value and hope for the best
            // FXIME: Dump snakeyaml and use our own decoder
        }
        return decoded;
    }

As the 'yaml' object in the code above is a static instance, it is shared across multiple threads. I have created a pull request which surrounds the yaml.load line with a synchronized block here.

Comments (6)

  1. Chase Tingley

    I was looking at the YamlEncoder class for unrelated reasons, and based on the comments it looks like it just copies the encoding logic from SnakeYaml. There's no YamlDecoder class (or a IDecoder interface), but it seems like we could make one and do the same thing for that?

  2. Log in to comment