Optimize SchemaValidator for request body validation
With former tickets #259, #250, #225, #214 and #213 the request validation was fairly optimized in terms of memory and speed. E.g. Base64 validation is now down to just 2ms for 2MB big byte arrays and <1ms for smaller data.
However this time is not reflected in complete request validation times. Validating a complete request with headers, parameters and bodies still takes around 10ms for big data. Where is the time?
So I did some profiling what takes the most time. For requests with big bodies the validation itself takes only 20% of the time. 75% of the time is used for reading the request body String
with Jackson into a JsonNode
. That is done in the SchemaValidator
right before calling the validation. It’s this line: return Json.mapper().readTree(value);
that is taking so long.
I did some research and readTree
is fairly expensive for Strings and Readers. With byte arrays or streams it would be double as fast.
So my proposal would be to change the Request
interface by adding a new method:
(Something like that.)
default public Optional<InputStream> getContent() {
return getBody()
.map(body -> new ByteArrayInputStream(body.getBytes())); // backwards compability to custom Request implementations
}
The SimpleRequest
, its builder and all module classes implementing the Request
interface would then be changed to setting the requests InputStream
instead of setting the String
body.
That would lead to a measurable boost on even small requests with 100kB JSONs.
That would lower the memory footprint as the stream is read and not an extra String body is created.
Comments (11)
-
-
reporter There is one challenge to this task.
These lines
if (isFormDataContentType(request)) { final String bodyAsJson = parseUrlEncodedFormDataBodyAsJson(requestBody.get()); return schemaValidator .validate( bodyAsJson, maybeApiMediaTypeForRequest.get().getRight().getSchema(), "request.body") .withAdditionalContext(context); }
validate the request body in case it is form data with content type
x-www-form-urlencoded
.If there would be an
InputStream
instead of aString
as request body theInputStream
has to be converted by the Swagger-Request-Validator (SRV) into aString
.Okay. But what encoding to use?
w3.org defines that only
UTF-8
conformsx-www-form-urlencoded
, but:A legacy server-oriented implementation might have to support encodings other than UTF-8 as well as have special logic for tuples of which the name is `
_charset
`. Such logic is not described here as only UTF-8 is conforming.(See: https://url.spec.whatwg.org/#urlencoded-parsing)
Does the SRV have to support those legacy servers?
Only the project owner can decide that. @James Navin @James Navin
In my opinion the services that specify a Swagger or OpenAPI interface are to young to be those mentioned “legacy server”.
-
From my reading of https://url.spec.whatwg.org/#concept-urlencoded Id be happy to go with UTF-8, possibly also looking for a
charset
directive in the content type (even though that is not supported by the spec but can almost guarantee someone out there is doing it).
-
reporter Alright.
-
In general Im more than happy to support this proposal. The only concern I would have is that the some originating request implementations might supply an input stream that consumes the body without the ability to reset it for downstream processing (similar to the SpringMVC implementation).
As long as we are careful to ensure that doesn’t happen (e.g. by using an approach similar to the one used in SpringMVC) I think it’s a good idea (although it would probably remove the memory benefits if we have to cache all of the consumed bytes anyway).
-
reporter Originating request implementations will have set the whole body as String and not as InputStream.
I will not remove the setBody(String) method - perhaps just deprecate it.
If someone sets the body as String I will convert it into an InputStream.
If someone already sets the body as String and cares about resettablity it must have been already implemented.
-
reporter @James Navin Is it necessary that the same request gets validated twice?
After implementing this task some tests fail because the same request is validated twice.
-
Do you mean the body is validated twice in the same validation call? Or that some of the tests re-use a
Request
instance for validation multiple times? -
reporter The latter.
For example the
SwaggerV2RequestValidationTest
contains this:@Test public void validate_withValidJsonBody_shouldPass() { final Request request = SimpleRequest.Builder .post("/users") .withBody(loadJsonRequest("newuser-valid")) .withContentType("application/json;charset=utf8") .withAuthorization("Basic EncryptedUsernameAndPassword") .build(); assertPass(classUnderTest.validate(request, validUserResponse)); assertPass(classUnderTest.validateRequest(request)); // fails as the InputStream has been read during the validation before }
-
@Sven Döring does your recent changes allow us to resolve this ticket? Or do you have more changes planned?
-
reporter - changed status to resolved
Nothing more is planned. The current solution got even better than expected.
- Log in to comment
Thanks for this investigation, I hope it will be approved.
In addition, please see my two other proposals for performance optimization on request parameter, path variable, and body validation: