Replace validation of strings with type base64 to optimize memory usage
Currently strings of type base64 are regex-checked within the Java-Json-Tools. It uses Nashorn as Javascript engine and therefore checks for base64-string-validity within the script engine.
I did tests and validating two 1MB huge strings 600 times took 12GB of heap - tested with the EpsilonGC. All those heap has to be garbage collected.
I would like to ask, if - only for that case String of type Base64 - this JS-regex-validation can be replaced with a memory-friendly Java solution.
Comments (6)
-
reporter -
reporter Here is a simple test:
package com.atlassian.oai.validator; import java.io.File; import java.io.IOException; import java.util.ArrayList; import java.util.Base64; import java.util.List; import java.util.Random; import java.util.stream.IntStream; import org.apache.commons.io.FileUtils; import org.junit.Assert; import org.junit.Rule; import org.junit.Test; import org.junit.rules.TemporaryFolder; import com.atlassian.oai.validator.model.Request; import com.atlassian.oai.validator.model.SimpleRequest; import com.atlassian.oai.validator.report.ValidationReport; public class Base64MemTest { private static final String SCHEMA = "{\n" + " \"swagger\": \"2.0\",\n" + " \"info\": {\"title\": \"Base64 test\"},\n" + " \"paths\": {\n" + " \"/base64\": {\n" + " \"post\": {\n" + " \"description\": \"Base64 Test\",\n" + " \"consumes\": [\n" + " \"application/json\"\n" + " ],\n" + " \"parameters\": [\n" + " {\n" + " \"name\": \"request\",\n" + " \"in\": \"body\",\n" + " \"required\": true,\n" + " \"schema\": {\n" + " \"$ref\": \"#/definitions/Base64Array\"\n" + " }\n" + " }\n" + " ],\n" + " \"responses\": {\n" + " \"204\": {\n" + " \"description\": \"OK\"\n" + " }\n" + " }\n" + " }\n" + " }\n" + " },\n" + " \"definitions\": {\n" + " \"Base64Array\": {\n" + " \"type\": \"object\",\n" + " \"properties\": {\n" + " \"array\": {\n" + " \"type\": \"array\",\n" + " \"items\": {\n" + " \"type\": \"string\",\n" + " \"format\": \"byte\"\n" + " }\n" + " }\n" + " }\n" + " }\n" + " }\n" + "}"; @Rule public TemporaryFolder testFolder = new TemporaryFolder(); @Test public void test() throws IOException { final File schema = testFolder.newFile("openapi.json"); FileUtils.write(schema, SCHEMA); final OpenApiInteractionValidator validator = OpenApiInteractionValidator.createFor(schema.getAbsolutePath()).build(); final Request request = SimpleRequest.Builder.post("/base64").withContentType("application/json").withBody(createBody()).build(); final ValidationReport report = validator.validateRequest(request); Assert.assertFalse(report.hasErrors()); } private static String createBody() { final byte[] bytes = new byte[1_000_000]; new Random().nextBytes(bytes); final String base64 = Base64.getEncoder().encodeToString(bytes); final List<String> list = new ArrayList<>(); IntStream.range(0, 20).forEach(i -> list.add(base64)); return "{\"array\": [\"" + String.join("\", \"", list) + "\"]}"; } }
And started with the Epsilon GC it ends with:
[1.129s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 229M (5.61%) used
[1.886s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 434M (10.62%) used
[2.153s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 639M (15.62%) used
[2.350s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 844M (20.63%) used
[2.513s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 1049M (25.63%) used
[2.673s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 1254M (30.64%) used
[2.837s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 1459M (35.64%) used
[2.998s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 1664M (40.65%) used
[3.201s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 1869M (45.65%) used
[3.363s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 2074M (50.66%) used
[3.519s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 2241M (54.72%) usedWith SVR 1.5.1 it’s only:
[5.164s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 692M (16.91%) used
Without the validation in line 76 and 77 it’s only:
[1.167s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 229M (5.61%) used
There is something seriously wrong with SVR 2. 2GB heap usage for validating a 25MB json?!
-
reporter Why is SVR2 still using the old version of the the java-json-tools? That old thing is using the Rhino JS engine those heap usage is pretty massive.
Anyway…
Currently there are two validations validating that a byte formatted string matches the Base64 pattern.
The first validation is done here:
RhinoHelper#regMatch(String, String)
The second validation is done here:Base64Attribute#validate(ProcessingReport, MessageBundle, FullData)
I think only one validation is enough. And I would prefer that only the second validation stays.
-
reporter I’ve hacked the first validation out and now the test says:
[2.381s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 351M (8.58%) used
Much better than 2241MB (or even 692MB with latest java-json-tools). And there is no limitation on the validation result. It’s still getting correctly validated.
-
reporter This issue only occurs on OAI2 / Swagger definitions. The OpenAPI loader for v3 does not set the ‘pattern’ on string / byte fields.
-
- changed status to resolved
Available in v2.7.1
- Log in to comment
There is already an
Base64Attribute
validator validating Base64 strings against a pattern.But because a byte formatted string in OpenAPI has a defined pattern, too, it will go to the
java-json-tools
to validate that string against that pattern.So there are two validations if the string is correctly Base64 encoded.