During the last month I've entirely rewritten generator of primitive specializations, all Trove basic interfaces, hash maps and hash sets.
New generator pocesses 100% valid Java code:
*Char*Short* versions. Now it much more convenient to develop specializations in Intellij. There are also some Ant tasks for separating complete library subset (including also some
*Short*Short* classes) from other generated classes, to work with.
New classes hierarchy is accurately put into Java Collections framework. Conflicting specializations are named as they are going to be in Java 8: getChar, toCharArray, nextChar in TCharIterator, and so on. Fortunately, in most cases (where a primitive value is passed as an argument) method overloading allows to preserve usual names.
Implemented all new methods of Map, Collection and Iterable iterfaces, introduced in Java 8, except
Added new notable "mixin" interface - gnu.trove.TAbstractHash, theoretically allows to abstract double hashes, linear probing hashes, etc, in far-far future, however.
Under the hood
THash ( renamed to
DHash, "double hash"):
Rehash threshold - for free slots instead of full slots, because in general we querying hashes more often, than inserting into. In
index()operation removed slots act like full ones.
removedcounter and hard-coded "auto-compaction" clause
size < removed || removed > freeinstead of removes countdown heuristic.
modCountfield (thus, "honest" fail-fast).
Primitives): removed floating-point "salt" rehash, because it is targeting tables with power-of-two capacity.
insert()methods: reordered branches, where makes sense, and some other appempts to outwit HotSpot (failed :) Also, optimized case when we know there are no
removedslots currently in the hash.
consumeFreeSlotfield in favor of 1) calling abstract
insertAtmethod, leading to one boxing-unboxing in maps with primitive values; 2) returning a tuple of index and state, with hope in Escape Analysis. Haven't tested performance difference yet.
Leaf Set and Map implementations:
rehash()doesn't check element equality even by reference, assuming provided
hashCode()key/element implementations are good.
All bulk methods are plain old loops over array of states or key objects, no iterators, no
forEach(). Should be faster. Additionaly, fast-fail with
A kind of double dispatch:
allContainingIn(), etc. - because implementations know better, how to iterate over themselves.
To be honest, I don't believe to exising benchmarks, they meausure index, insert, rehash simultaneously.
Implemented only 2 benchmarks for now:
- Iteration of any kind: for-each loop (
j.u.Iterablenow!), the old way,
forEach()with "lambda", and idiomatic
tryAdvance()from Java 8. By the way,
IntSetwith load factor 0.5:
nsec/iter forEachFunction 6 forEachLoop 18 simple 8.5 tryAdvance 8.5
index(), i. e.
contains(), of random present / random absent elements, load factor 0.5 / 0.8.
Performance comparsion with Trove 3.0.3
Iteration over hashes: approx. +30 %, there was one relatively seriuos inefficiency in the previous implementation.
countains()): -5% - +10%. Of cause, in hashes of non-trivial objects, even Strings,
hashCode()costs outbids these effects many times.
Bulk methods and
rehash()should be significantly faster, but haven't measured.
"Hard" generics: now all methods necessarily thows ClassCastException on acepting not an instance of subclass of it's generic type. Althogth it is allowed by j.u.Collection and Map contracts, not very good.
In the last commit in my repo (https://bitbucket.org/leventov/trove/commits/all) I dropped Serializable implementations and everything except hashes: lists, stacks, queues, because I won't have any time for developing
Trove soon, and personally don't need them right now. There are only hashes now, on the other hand, they are compiling and passing tests.
That is why:
I suggest everybody to participate in testing the new hashes. Remember, they are under Java Collections frameworks, almost drop-in replacement of
HashMap<Integer, String>, for example.
I suggest everybody to discuss my decisions. Many of them are arguable.
Lets outwork the best collections-of-primitives library!
See also a little TODO I've outlined (however, don't hope much anyone will implement something from it :(