Replace validation of strings with type base64 to optimize memory usage

Issue #214 resolved
Sven Döring created an issue

Currently strings of type base64 are regex-checked within the Java-Json-Tools. It uses Nashorn as Javascript engine and therefore checks for base64-string-validity within the script engine.

I did tests and validating two 1MB huge strings 600 times took 12GB of heap - tested with the EpsilonGC. All those heap has to be garbage collected.

I would like to ask, if - only for that case String of type Base64 - this JS-regex-validation can be replaced with a memory-friendly Java solution.

Comments (6)

  1. Sven Döring reporter

    There is already an Base64Attribute validator validating Base64 strings against a pattern.
    But because a byte formatted string in OpenAPI has a defined pattern, too, it will go to the java-json-tools to validate that string against that pattern.

    So there are two validations if the string is correctly Base64 encoded.

  2. Sven Döring reporter

    Here is a simple test:

    package com.atlassian.oai.validator;
    
    import java.io.File;
    import java.io.IOException;
    import java.util.ArrayList;
    import java.util.Base64;
    import java.util.List;
    import java.util.Random;
    import java.util.stream.IntStream;
    
    import org.apache.commons.io.FileUtils;
    import org.junit.Assert;
    import org.junit.Rule;
    import org.junit.Test;
    import org.junit.rules.TemporaryFolder;
    
    import com.atlassian.oai.validator.model.Request;
    import com.atlassian.oai.validator.model.SimpleRequest;
    import com.atlassian.oai.validator.report.ValidationReport;
    
    public class Base64MemTest {
    
        private static final String SCHEMA = "{\n" +
                "  \"swagger\": \"2.0\",\n" +
                "  \"info\": {\"title\": \"Base64 test\"},\n" +
                "  \"paths\": {\n" +
                "    \"/base64\": {\n" +
                "      \"post\": {\n" +
                "        \"description\": \"Base64 Test\",\n" +
                "        \"consumes\": [\n" +
                "          \"application/json\"\n" +
                "        ],\n" +
                "        \"parameters\": [\n" +
                "          {\n" +
                "            \"name\": \"request\",\n" +
                "            \"in\": \"body\",\n" +
                "            \"required\": true,\n" +
                "            \"schema\": {\n" +
                "              \"$ref\": \"#/definitions/Base64Array\"\n" +
                "            }\n" +
                "          }\n" +
                "        ],\n" +
                "        \"responses\": {\n" +
                "          \"204\": {\n" +
                "            \"description\": \"OK\"\n" +
                "          }\n" +
                "        }\n" +
                "      }\n" +
                "    }\n" +
                "  },\n" +
                "  \"definitions\": {\n" +
                "    \"Base64Array\": {\n" +
                "      \"type\": \"object\",\n" +
                "      \"properties\": {\n" +
                "        \"array\": {\n" +
                "          \"type\": \"array\",\n" +
                "          \"items\": {\n" +
                "            \"type\": \"string\",\n" +
                "            \"format\": \"byte\"\n" +
                "          }\n" +
                "        }\n" +
                "      }\n" +
                "    }\n" +
                "  }\n" +
                "}";
    
        @Rule
        public TemporaryFolder testFolder = new TemporaryFolder();
    
        @Test
        public void test() throws IOException {
            final File schema = testFolder.newFile("openapi.json");
            FileUtils.write(schema, SCHEMA);
            final OpenApiInteractionValidator validator = OpenApiInteractionValidator.createFor(schema.getAbsolutePath()).build();
            final Request request = SimpleRequest.Builder.post("/base64").withContentType("application/json").withBody(createBody()).build();
            final ValidationReport report = validator.validateRequest(request);
            Assert.assertFalse(report.hasErrors());
        }
    
        private static String createBody() {
            final byte[] bytes = new byte[1_000_000];
            new Random().nextBytes(bytes);
    
            final String base64 = Base64.getEncoder().encodeToString(bytes);
            final List<String> list = new ArrayList<>();
            IntStream.range(0, 20).forEach(i -> list.add(base64));
            return "{\"array\": [\"" + String.join("\", \"", list) + "\"]}";
        }
    }
    

    And started with the Epsilon GC it ends with:

    [1.129s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 229M (5.61%) used
    [1.886s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 434M (10.62%) used
    [2.153s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 639M (15.62%) used
    [2.350s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 844M (20.63%) used
    [2.513s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 1049M (25.63%) used
    [2.673s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 1254M (30.64%) used
    [2.837s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 1459M (35.64%) used
    [2.998s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 1664M (40.65%) used
    [3.201s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 1869M (45.65%) used
    [3.363s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 2074M (50.66%) used
    [3.519s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 2241M (54.72%) used

    With SVR 1.5.1 it’s only:

    [5.164s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 692M (16.91%) used

    Without the validation in line 76 and 77 it’s only:

    [1.167s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 229M (5.61%) used

    There is something seriously wrong with SVR 2. 😟 2GB heap usage for validating a 25MB json?!

  3. Sven Döring reporter

    Why is SVR2 still using the old version of the the java-json-tools? That old thing is using the Rhino JS engine those heap usage is pretty massive.

    Anyway…

    Currently there are two validations validating that a byte formatted string matches the Base64 pattern.

    The first validation is done here: RhinoHelper#regMatch(String, String)
    The second validation is done here: Base64Attribute#validate(ProcessingReport, MessageBundle, FullData)

    I think only one validation is enough. And I would prefer that only the second validation stays.

  4. Sven Döring reporter

    I’ve hacked the first validation out and now the test says:

    [2.381s][info ][gc] Heap: 4096M reserved, 4096M (100.00%) committed, 351M (8.58%) used

    Much better than 2241MB (or even 692MB with latest java-json-tools). And there is no limitation on the validation result. It’s still getting correctly validated.

  5. Sven Döring reporter

    This issue only occurs on OAI2 / Swagger definitions. The OpenAPI loader for v3 does not set the ‘pattern’ on string / byte fields.

  6. Log in to comment