- edited description
- marked as major
Optimize JSON schema generation with caching
I would like to use web request validation based on an OpenAPI specification in a productive environment with a maximum response time of 10 ms. This number should be achievable with web requests with an array of 500 entries, and with an object of depth equal to two for each entry.
Former tickets , #259, #250, #225 and #214 could give a performance boost to the request validation.#213
Former ticket #273 illustrates a serious time-consuming operation related to the reading of the request body
and its conversion into a JsonNode; this ticket should be treated as well.
Former ticket #291 illustrates a serious time-consuming operation related to the construction of a schema object based on the Swagger schema and its transformation by iterations to finally get a JsonNode; this ticket should be treated as well.
Another time-consuming operation is the systematic construction of a JsonSchema object associated with the JsonNode schema. After analysis, I notice that this result is the same for each call on a given endpoint, so it is reasonable to want to keep this result for reuse.
Setting up a cache using a ConcurrentHashMap allows a drastic improvement in response time. The caching key must be able to identify the current API operation, including the target, i.e. a URL variable or a query parameter.
For example:
public class SchemaValidator {
...
private Map<String, JsonSchema> jsonSchemaCache = new ConcurrentHashMap<>();
public ValidationReport validate(@Nonnull final String operationId,
@Nonnull final String value,
@Nullable final Schema schema,
@Nullable final String keyPrefix) {
...
content = readContent(value, schema);
final String validationKey = String.format("%s.%s", keyPrefix, operationId);
final JsonNode schemaObject = readAndTransformSchemaObject(validationKey, schema, keyPrefix);
final JsonSchema jsonSchema = toJsonSchema(validationKey, schemaObject);
...
}
private JsonSchema toJsonSchema(final String key, final JsonNode schemaObject) throws ProcessingException {
final JsonSchema schema = jsonSchemaCache.computeIfAbsent(key, o -> {
try {
return schemaFactory().getJsonSchema(schemaObject);
} catch (ProcessingException e) {
return null;
}
});
if (schema != null) { return schema; } else { throw new ProcessingException(); }
}
Convinced by the usefulness of an executable API specification for syntax and semantic live validation, I allow myself to set a high level of priority.
Thanks for your feedback,
Kind regards
Comments (17)
-
reporter -
reporter - edited description
-
I’m always into optimization and already thought of that.
However I wasn’t able to get a measurable difference on integrating this optimization. That’s because I always measure in milliseconds and this operation does not reach that bottom limit.
Did you do a measurement yourself? Does it take many microseconds?
I can imagine that complex OpenAPI specifications may take longer.
-
Hi Olivier,
Thanks for raising this. As you identified, @Sven Döring has done a great job applying some performance optimizations in the past, and has some suggestions for additional improvements.
I am more than willing to look at further optimizations, but I think before we go too far down this path I would like to add a set of performance benchmark tests to the library so we can better reason about the impact of changes (this has been on my backlog for a while but I haven’t got to it yet). I have raised
#294to add a basic set of benchmarks. If you would like to contribute to those benchmarks to demonstrate the use case you want to optimize for I would be very happy to review PRs.Cheers,
James
-
reporter - edited description
-
reporter @Sven Döring Hi Sven,
I did my first measurements using a small machine (MacBookAir ie7, core duo, 1.7 GHz, 8GB) in order to have the first insight about what are the greedy operations: the construction of the schema object, the construction of the JsonSchema, and the validation against JsonSchema are the more expensive ones.
Input payload (small portfolio)
{ "portfolio": { "currency": "CHF", "positions": [ { "asset": "id/2030ae21-25d5-4899-9016-9bd67d12b92e" }, { "asset": "id/2030ae21-25d5-4899-9016-9bd67d12b92f" } ] } }
Time measures (first shot, caching get initialized)
Validating web request. Action validateSecurity took 0 ms. Action validateContentType took 64 ms. Action validateAccepts took 1 ms. Action validateHeaders took 1 ms. Action validatePathParameters took 1 ms. Action readContent took 1 ms. Hit missed on key get-prc Action readAndTransformSchemaObject took 23 ms. Hit missed on key get-prc Action getJsonSchema took 274 ms. Action validate json took 77 ms. Action processingReport took 353 ms. Action schemaValidator took 386 ms. Action validateRequestBody took 397 ms. Action validateQueryParameters took 1 ms. Action validateDeepObjectQueryParameters took 1 ms. Action validateUnexpectedQueryParameters took 3 ms. Action validateCookieParameters took 5 ms. Action validateCustom took 1 ms. Validation of incoming request took 500 ms. Valid web request.
Time measures (second shot, cache get hit)
Validating web request. Action validateSecurity took 0 ms. Action validateContentType took 1 ms. Action validateAccepts took 0 ms. Action validateHeaders took 0 ms. Action validatePathParameters took 0 ms. Action readContent took 0 ms. Action readAndTransformSchemaObject took 0 ms. Action getJsonSchema took 0 ms. Action validate json took 3 ms. Action processingReport took 4 ms. Action schemaValidator took 5 ms. Action validateRequestBody took 5 ms. Action validateQueryParameters took 0 ms. Action validateDeepObjectQueryParameters took 0 ms. Action validateUnexpectedQueryParameters took 0 ms. Action validateCookieParameters took 0 ms. Action validateCustom took 0 ms. Validation of incoming request took 8 ms. Valid web request.
So, caching obviously helps.
JsonSchema validation time measures (other shots using caching, bigger portfolios)
portfolio | Whole request validation [ms] | Only JsonSchema validation [ms] 15 | 7-12 | 5-10 100 | 23-34 | 20-30 500 | 138-263 | 135-259 5000 | 760-1280 | 750-1250
The validation of the input portfolio payload against JsonSchema is expensive.
I did my second measurements using a better machine (Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz, 12 cores, 50GB), and I could win a 5x factor on all numbers. Nevertheless, with such a machine but without caching, I am not able to achieve 10 ms for small portfolios which is not acceptable, thus caching is an option.
Moreover, with this new machine, the JsonSchema validation still costs 20-30 ms for 500-portfolios. This is another pain point.
The OpenAPI specification I am using for these cases is not trivial, fortunately, it can be mitigated with caching.
-
Impressive.
Can you please rerun the tests without the caching change? I want to get a feeling of what is just warm-up and what are the real costs of schema validation.
And how many requests have you validated to get your times?
Do I get it right that only calling the
.validate (content, true)
method took 99% of the time? Or is it including resolving theJsonSchema
? -
reporter Hi Sven,
The execution time for the different operations during the warm-up is listed in the section: Time measures (first shot, caching get initialized)
Action validateRequestBody took 397 ms. Action schemaValidator took 386 ms. Action readContent took 1 ms. Action readAndTransformSchemaObject took 23 ms. Action processingReport took 353 ms. Action getJsonSchema took 274 ms. Action validate json took 77 ms.
For the same request, the numbers once the cache is initialized are listed in the section: Time measures (second shot, cache get hit)
Action validateRequestBody took 5 ms. Action schemaValidator took 5 ms. Action readContent took 0 ms. Action readAndTransformSchemaObject took 0 ms. //thank caching Action processingReport took 4 ms. Action getJsonSchema took 0 ms. //thank caching Action validate json took 3 ms.
I have worked with three POST requests, two without path-variables, and one with two path-variables (Path-variables get validated too).
The numbers above are the ones for the first request without path-variables, and with a 2-positions portfolio.
Once caching is in place and hot, the validation of the request payload against the JsonSchema costs at least 95% of the time. See my numbers in the attachment.
The JsonSchema validator is another project in itself; I wonder if it is possible to improve it to make it more performant; there is also not only one community project on this topic.
-
reporter -
In the ramp up phase of an application everything is slower.
After a few thousand request validations what are the average validation times without implemented caching? 🤔
-
reporter I will try to answer this question with another benchmark the coming week.
-
There is now an open pull request for benchmarking and I added your changes to the SchemaValidator and run the Benchmark again.
Before:
Benchmark Mode Cnt Score Error Units OpenApiInteractionValidatorBenchmark.requestValidation_withLargeBodyAsStream avgt 25 3710.542 ± 72.549 us/op
After:
Benchmark Mode Cnt Score Error Units OpenApiInteractionValidatorBenchmark.requestValidation_withLargeBodyAsStream avgt 25 3388.811 ± 33.296 us/op
It’s faster.
-
reporter Hi @Sven Döring , thanks for integrating the proposed caching.
I did not have much time this last couple of months to work on it, and I will continue next year.
Let’s stay in touch.
Regards -
Nah, it was no real implementation. It was more pitched into it.
Here are more times for small requests:
Before:
Benchmark Mode Cnt Score Error Units OpenApiInteractionValidatorBenchmark.requestValidation_withSmallBodyAsStream avgt 25 180.764 ± 1.709 us/op
After:
Benchmark Mode Cnt Score Error Units OpenApiInteractionValidatorBenchmark.requestValidation_withSmallBodyAsStream avgt 25 48.487 ± 0.241 us/op
That change is definitely worth it. Almost all validation time is used for building this schema.
-
reporter Right. Obviously, caching brings something. Next, the final step supported by the validation of request and response payloads against the resulting JSON Schema takes a lot of time; I would like to replace the current community validator by a more performant one, but it will require some integration effort.
-
Memoization of the schema is available in v2.11.5
-
- changed status to resolved
- Log in to comment