SES-1941 : Do not cache blank node identifiers if they are preserved

Branch: SES-1941

Branch: 2.7.x

Merged

#206 · Created 2013-10-28 · Last updated 2013-10-30

Merged pull request

Merged in SES-1941 (pull request #206)

3c41809·Author: Peter Ansell·Closed by: Peter Ansell·2013-10-30

Description

The current implementation of RDFParserBase.createBNode(String) always caches blank node objects, even if the user specifies that they want to preserve blank node identifiers. In the case where users want to preserve blank node identifiers the API specifies that ValueFactory.createBNode(String) must reliably regenerate an equivalent BNode object (with the same label), so we don't need to cache the mapping ourselves.

Note, we may need to undo this patch if it turns out that existing ValueFactory implementations are not able to fulfill that contract, and have previously been relying on the current algorithm to ensure consistency.

Not caching the mapping ourselves enables the loading of large RDF files that contain a large number of named blank nodes, without the extra overhead. Note, this change does not modify RDFParserBase.createBNode() which must always return a new blank node. Hence, one workaround if this patch does not work as designed when it is released into the wild is to construct RDF documents containing anonymous blank nodes where possible. For example, using the "[]" construct in turtle instead of "_:a1" where possible.

I have created this patch against 2.7.x, as the contracts have not been modified, but it can also be done against 2.8.0 (master) if it is deemed too risky for a minor release. Note, 2.8.0 is now not going to use Java-7 language features so the upgrade path to it is a little lower.

Update: The patch now uses a UUID instead of a map to ensure global uniqueness of blank node identifiers when users do not want to preserve blank node ids.

SES-1941 : Do not cache blank node identifiers if they are preserved

Merged pull request

Description

0 attachments

0 comments

Loading commits...