Heavy Allocation in Emitter.analyzeScalar(String) due to Regex Overhead

Issue #1075 resolved
Carter Kozak created an issue

Writing large yaml documents using snakeyml I’ve found GC contention resulting from integer array allocations within Pattern.matcher.
I believe we can dramatically improve performance by refactoring the leading-zero check away from regex and into a simple utility method along these lines:

private static boolean hasLeadingZero(String scalar) {
if (scalar.length() > 1 && scalar.charAt(0) == '0') {
int secondChar = scalar.charAt(1);
return (secondChar >= '0' && secondChar <= '9') || secondChar == '_';
}
return false;
}

Comments (15)

  1. Carter Kozak reporter

    I’ve created a PR here with my proposed solution: https://github.com/snakeyaml/snakeyaml/pull/5

    There are a few similar regex patterns. I don’t see them in profiling data, so I opted not to attempt replacement. In a fairly large application, integer arrays within analyzeScalar regex account for 83% of total allocation exceeding 9 terabytes per minute.

  2. Carter Kozak reporter

    I’ve put together a test here: https://github.com/snakeyaml/snakeyaml/pull/6 since the original PR was merged while I worked on writing the test (thank you for such a quick review!). This sort of test is tricky because garbage collectors are complex machines, and it can be challenging to test specifically what we mean to without making the build unnecessarily expensive or flaky.

  3. Carter Kozak reporter

    Here’s the JMH benchmark I described earlier, which does a better job of illustrating the high rate of allocation: https://github.com/snakeyaml/snakeyaml/pull/7

    Benchmark results and analysis copied here from the github pr for posterity.

    Before 26f5d5f8 (benchmarks run atop 98ff897):

    Benchmark                                        Mode  Cnt     Score     Error   Units
    EmitterBenchmark.emitScalar                      avgt    3     0.311 ±   0.009   us/op
    EmitterBenchmark.emitScalar:·gc.alloc.rate       avgt    3  6697.479 ± 420.474  MB/sec
    EmitterBenchmark.emitScalar:·gc.alloc.rate.norm  avgt    3   256.000 ±   0.001    B/op
    EmitterBenchmark.emitScalar:·gc.count            avgt    3   108.000            counts
    EmitterBenchmark.emitScalar:·gc.time             avgt    3    62.000                ms
    

    After 26f5d5f8 (benchmarks run atop 8a3063b):

    Benchmark                                        Mode  Cnt     Score     Error   Units
    EmitterBenchmark.emitScalar                      avgt    3     0.220 ±   0.006   us/op
    EmitterBenchmark.emitScalar:·gc.alloc.rate       avgt    3  4713.283 ± 303.238  MB/sec
    EmitterBenchmark.emitScalar:·gc.alloc.rate.norm  avgt    3   128.000 ±   0.001    B/op
    EmitterBenchmark.emitScalar:·gc.count            avgt    3   116.000            counts
    EmitterBenchmark.emitScalar:·gc.time             avgt    3    63.000                ms
    

    The benchmarks were run on a macbook, so cpu scaling makes the timing values a bit chaotic and I wouldn't put too much trust in the ~30% speedup shown. The important piece is gc.alloc.rate.norm which is the garbage collected bytes per emitScalar invocation, which cut in half.

  4. Andrey Somov

    How do you produce the result ? Can you please provide the complete instructions to reproduce it ?

  5. Log in to comment