Optimize string / byte (Base64) validation
The OpenAPI type string / byte is a Base64 string representing an byte array.
The SRV checks the validity of those strings with a long regex in Base64Attribute
: "^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$"
It does its job.
Unfortunately Regex matches in Java are awfully slow. It’s actually eleven - 22 times faster to decode a Base64 String and check if an exception was thrown in case of invalidity.
But decoding is quite heavy on memory usage.
There is still an alternative. Guava offers an Base64 decoding stream. If the stream reads successfully the string is valid.
Below is some benchmarking code showing the time and memory usage for both validations.
Base64 String length | Base64 RegEx match | Base64 decoding stream |
---|---|---|
1.000 | 0.518s / 9313K | 0.411s / 10849K |
10.000 | 1.656s / 10090K | 0.734s / 10849K |
100.000 | 13.458s / 10578K | 3.648s / 12268K |
1.000.000 | 114.275s / 19271K | 31.516s / 20936K |
10.000.000 | 1302.552s / 105472k | 233.884s / 87851K |
Tested with
- Java 11
- Intel 8770k
- the very useful Epsilon GC
the decoding stream validation is permanently faster with only marginal higher memory usage.
import java.io.IOException;
import java.io.InputStream;
import java.io.StringReader;
import java.util.Base64;
import java.util.Random;
import java.util.regex.Pattern;
import org.junit.Assert;
import org.junit.Test;
import com.google.common.io.BaseEncoding;
public class Base64AttributeTest {
private static final String BASE64_STRING = createTestBase64();
private static final String NO_BASE64_STRING = BASE64_STRING.substring(0, 5_000_000) + "@" + BASE64_STRING.substring(5_000_001);
private static final Pattern BASE64_PATTERN = Pattern.compile("^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$");
private static final int TEST_RUNS = 2_000;
private static String createTestBase64() {
final byte[] b = new byte[10_000_000];
new Random().nextBytes(b);
return Base64.getEncoder().encodeToString(b);
}
@Test
public void testRegex() {
for (int i = TEST_RUNS; i > 0; --i) {
if (!BASE64_PATTERN.matcher(BASE64_STRING).matches()) {
Assert.fail("This String is a Base64 String.");
}
if (BASE64_PATTERN.matcher(NO_BASE64_STRING).matches()) {
Assert.fail("This String is not a Base64 String.");
}
}
}
@Test
public void testDecode() {
for (int i = TEST_RUNS; i > 0; --i) {
if (!isBase64(BASE64_STRING)) {
Assert.fail("This String is a Base64 String.");
}
if (isBase64(NO_BASE64_STRING)) {
Assert.fail("This String is not a Base64 String.");
}
}
}
private static boolean isBase64(final String string) {
try {
final InputStream in = BaseEncoding.base64().decodingStream(new StringReader(string));
int read;
while ((read = in.read()) != -1) {
}
return true;
} catch (final IOException e) {
return false;
}
}
}
Comments (4)
-
-
- changed status to resolved
Available in v2.4.6
-
reporter @James Navin We are sending multiple high resolution images as part of a JSON in the request body.
These images are part of a bigger object containing additional Integer and String fields.
-
reporter - edited description
- Log in to comment
Thanks for raising this, and for the super-thorough perf testing.
Out of curiosity - what is your use case that has such large Base64 encoded strings?