Bug in yaml org.yaml.snakeyaml.emitter.Emitter writeDoubleQuoted

Issue #413 invalid
Alex nx created an issue

Hello

Class org.yaml.snakeyaml.emitter.Emitter

private void writeDoubleQuoted(String text, boolean split) throws IOException {
    ...
    if (ch <= '\u00FF') {
        String s = "0" + Integer.toString(ch, 16);
        data = "\\x" + s.substring(s.length() - 2);
    } else if (ch >= '\uD800' && ch <= '\uDBFF') {
        //if (end + 1 < text.length()) { // Also need to check low part of surrogate pair
        if (end + 1 < text.length() && Character.isLowSurrogate(text.charAt(end + 1))) {
            Character ch2 = text.charAt(++end);
            String s = "000" + Long.toHexString(Character.toCodePoint(ch, ch2));
            data = "\\U" + s.substring(s.length() - 8);
        } else {
            String s = "000" + Integer.toString(ch, 16);
            data = "\\u" + s.substring(s.length() - 4);
        }
    } else {
        String s = "000" + Integer.toString(ch, 16);
        data = "\\u" + s.substring(s.length() - 4);
    }
    ```
 }

You should also check "low" surrogate part because of possible incorrect surrogate pairs in source string. If no checks, then for example:
"\uD800\uFFEF" will be converted to \U000122ff and then in ScannerImpl:scanFlowScalarNonSpaces it will be converted back as:
"\uD808\uDEFF" (new String(Character.toChars(0x122ff))) and it becomes not equal to source string =(

Surrogate pair (example) has been got from https://github.com/google/guava/blob/master/guava/src/com/google/common/base/CharMatcher.java (line 1460), when I was trying to dump class constant pool to yml-file =)

I can offer a short form of your code:

    if (ch <= '\u00FF') {
        String s = "0" + Integer.toString(ch, 16);
        data = "\\x" + s.substring(s.length() - 2);
    } else if (end + 1 < text.length() && Character.isHighSurrogate(ch) && 
    Character.isLowSurrogate(text.charAt(end + 1))) {
        Character ch2 = text.charAt(++end);
        String s = "000" + Long.toHexString(Character.toCodePoint(ch, ch2));
        data = "\\U" + s.substring(s.length() - 8);
    } else {
        String s = "000" + Integer.toString(ch, 16);
        data = "\\u" + s.substring(s.length() - 4);
    }

Thanks =)

Comments (2)

  1. Andrey Somov

    Thank you Alex. If you can quickly deliver also a test which is failing now but succeeds with your patch then we can incorporate your pull request in the coming release (we hope to release 1.22 this month)

  2. Alex nx reporter

    Hello
    I am sorry. I was wrong =(

    This bug does not occure because StreamReader.isPrintable(...) in RepresentString.representData tells us to use it as BINARY data and tries to convert it to base64 string.
    But there is another problem in RepresentString.representData:

                        final byte[] bytes = value.getBytes("UTF-8");
                        // sometimes above will just silently fail - it will return incomplete data
                        // it happens when String has invalid code points
                        // (for example half surrogate character without other half)
                        final String checkValue = new String(bytes, "UTF-8");
                        if (!checkValue.equals(value)) {
                            throw new YAMLException("invalid string value has occurred");
                        }
    

    This code throws YAMLException("invalid string value has occurred") because
    "\uD800\uFEFF" is not equal new String("\uD800\uFEFF".getBytes("UTF-8"), "UTF-8")
    But this is another story and this issue can be canceled

    Thanks =)

  3. Log in to comment