Optimize SchemaValidator for request body validation

Issue #273 resolved

Sven Döring created an issue 2020-05-03

With former tickets ~~#259~~, ~~#250~~, ~~#225~~, ~~#214~~ and ~~#213~~ the request validation was fairly optimized in terms of memory and speed. E.g. Base64 validation is now down to just 2ms for 2MB big byte arrays and <1ms for smaller data.

However this time is not reflected in complete request validation times. Validating a complete request with headers, parameters and bodies still takes around 10ms for big data. Where is the time?

So I did some profiling what takes the most time. For requests with big bodies the validation itself takes only 20% of the time. 75% of the time is used for reading the request body String with Jackson into a JsonNode. That is done in the SchemaValidator right before calling the validation. It’s this line: return Json.mapper().readTree(value); that is taking so long.

I did some research and readTree is fairly expensive for Strings and Readers. With byte arrays or streams it would be double as fast.

‌

So my proposal would be to change the Request interface by adding a new method:
(Something like that.)

    default public Optional<InputStream> getContent() {
        return getBody()
                .map(body -> new ByteArrayInputStream(body.getBytes())); // backwards compability to custom Request implementations
    }

The SimpleRequest, its builder and all module classes implementing the Request interface would then be changed to setting the requests InputStream instead of setting the String body.

That would lead to a measurable boost on even small requests with 100kB JSONs.
That would lower the memory footprint as the stream is read and not an extra String body is created.

Comments (11)

Olivier von Dach
Thanks for this investigation, I hope it will be approved.

In addition, please see my two other proposals for performance optimization on request parameter, path variable, and body validation:
- https://bitbucket.org/atlassian/swagger-request-validator/issues/291
- https://bitbucket.org/atlassian/swagger-request-validator/issues/292
‌

‌
- 2020-07-29T15:17:40+00:00
Sven Döring reporter
There is one challenge to this task.

These lines
```
        if (isFormDataContentType(request)) {
            final String bodyAsJson = parseUrlEncodedFormDataBodyAsJson(requestBody.get());
            return schemaValidator
                    .validate(
                            bodyAsJson,
                            maybeApiMediaTypeForRequest.get().getRight().getSchema(),
                            "request.body")
                    .withAdditionalContext(context);
        }
```
validate the request body in case it is form data with content type x-www-form-urlencoded.

If there would be an InputStream instead of a String as request body the InputStream has to be converted by the Swagger-Request-Validator (SRV) into a String.

Okay. But what encoding to use?

w3.org defines that only UTF-8 conforms x-www-form-urlencoded, but:

A legacy server-oriented implementation might have to support encodings other than UTF-8 as well as have special logic for tuples of which the name is `_charset`. Such logic is not described here as only UTF-8 is conforming.

(See: https://url.spec.whatwg.org/#urlencoded-parsing)

Does the SRV have to support those legacy servers?

Only the project owner can decide that. @James Navin @James Navin

In my opinion the services that specify a Swagger or OpenAPI interface are to young to be those mentioned “legacy server”.
- 2020-07-29T21:08:52+00:00
James Navin
From my reading of https://url.spec.whatwg.org/#concept-urlencoded Id be happy to go with UTF-8, possibly also looking for a charset directive in the content type (even though that is not supported by the spec but can almost guarantee someone out there is doing it).

‌
- 2020-07-29T22:58:16+00:00
Sven Döring reporter
Alright.
- 2020-07-29T23:01:35+00:00
James Navin
In general Im more than happy to support this proposal. The only concern I would have is that the some originating request implementations might supply an input stream that consumes the body without the ability to reset it for downstream processing (similar to the SpringMVC implementation).

As long as we are careful to ensure that doesn’t happen (e.g. by using an approach similar to the one used in SpringMVC) I think it’s a good idea (although it would probably remove the memory benefits if we have to cache all of the consumed bytes anyway).
- 2020-07-29T23:03:03+00:00
Sven Döring reporter
Originating request implementations will have set the whole body as String and not as InputStream.

I will not remove the setBody(String) method - perhaps just deprecate it.

If someone sets the body as String I will convert it into an InputStream.

‌

If someone already sets the body as String and cares about resettablity it must have been already implemented.
- 2020-07-29T23:18:04+00:00
Sven Döring reporter
@James Navin Is it necessary that the same request gets validated twice?

After implementing this task some tests fail because the same request is validated twice.
- 2020-07-31T15:29:15+00:00
James Navin
Do you mean the body is validated twice in the same validation call? Or that some of the tests re-use a Request instance for validation multiple times?
- 2020-08-02T23:11:26+00:00

Sven Döring reporter

The latter.

For example the SwaggerV2RequestValidationTest contains this:

    @Test
    public void validate_withValidJsonBody_shouldPass() {
        final Request request = SimpleRequest.Builder
                .post("/users")
                .withBody(loadJsonRequest("newuser-valid"))
                .withContentType("application/json;charset=utf8")
                .withAuthorization("Basic EncryptedUsernameAndPassword")
                .build();

        assertPass(classUnderTest.validate(request, validUserResponse));
        assertPass(classUnderTest.validateRequest(request)); // fails as the InputStream has been read during the validation before
    }

‌

2020-08-03T09:42:19+00:00

James Navin
@Sven Döring does your recent changes allow us to resolve this ticket? Or do you have more changes planned?
- 2020-12-14T00:53:01+00:00
Sven Döring reporter
- changed status to resolved
Nothing more is planned. The current solution got even better than expected.
- 2020-12-14T23:05:06+00:00
Log in to comment

Assignee: –

Type: proposal

Priority: major

Status: resolved

Votes: 3

Watchers: 5