Heavy Allocation in Emitter.analyzeScalar(String) due to Regex Overhead

Writing large yaml documents using snakeyml I’ve found GC contention resulting from integer array allocations within Pattern.matcher.
I believe we can dramatically improve performance by refactoring the leading-zero check away from regex and into a simple utility method along these lines:

private static boolean hasLeadingZero(String scalar) {
if (scalar.length() > 1 && scalar.charAt(0) == '0') {
int secondChar = scalar.charAt(1);
return (secondChar >= '0' && secondChar <= '9') || secondChar == '_';
}
return false;
}

Comments (15)