This package contains experiments into different ways of deterministically hashing JSON object trees.
This implementation is very loosely based on the Etherium Merkle-Patricia Trie specification located on the Etherium wiki.
There are many implementation differences, but the general idea is the same: use a trie to store a key/value dictionary and then compute the root hash from the hash of each child tree.
This approach hashes a JSON tree by visiting each node and sequentially updating the hash with the path of each terminal value and the value itself.
In order to ensure the digest is deterministic, the fields of each ObjectNode are sorted before hashing.
This approach provides a custom JsonNodeFactory and ObjectNode subclass to replace the default insertion-ordered field iteration with a lexicographically ordered traversal.
This is similar in goal to the SortedJSON approach, but just hashes the serialised JSON output of the whole tree.
Some very rough microbenchmarks have been run to compare these approaches:
Results from an i7 MacBook Pro @2.2GHz:
Trie: 50000 iterations in 996ms Prebuilt Trie: 50000 iterations in 439ms Sorted Json: 50000 iterations in 633ms Ordered Json: 50000 iterations in 123ms
The trie caches sub-tree hashes, which might allow comparison-heavy applications to gain a benefit if they are updating trees and then hashing the result.
Otherwise the custom ObjectNode approach which makes field order deterministic is probably better for most applications.
Why a trie?
The Etherium use case for this data structure is to provide proof that a subtree exists within the trie by providing only a subset of the nodes, which obviously does not prioritise the highest hashing performance.
The implementation here wouldn't be useful for this purpose as it stands, as the trie nodes do not store the hashes of child nodes, but rather a pointer to them.
The code in this repository is licensed under the MIT License