- edited description
Trove 4.0?
During the last month I've entirely rewritten generator of primitive specializations, all Trove basic interfaces, hash maps and hash sets.
Generator
New generator pocesses 100% valid Java code: *Char*
and *Char*Short*
versions. Now it much more convenient to develop specializations in Intellij. There are also some Ant tasks for separating complete library subset (including also some *Short*
, *Char*Char*
and *Short*Short*
classes) from other generated classes, to work with.
Framework
New classes hierarchy is accurately put into Java Collections framework. Conflicting specializations are named as they are going to be in Java 8: getChar, toCharArray, nextChar in TCharIterator, and so on. Fortunately, in most cases (where a primitive value is passed as an argument) method overloading allows to preserve usual names.
Implemented all new methods of Map, Collection and Iterable iterfaces, introduced in Java 8, except spliterator()
.
Added new notable "mixin" interface - gnu.trove.TAbstractHash, theoretically allows to abstract double hashes, linear probing hashes, etc, in far-far future, however.
Under the hood
Changes in THash
( renamed to DHash
, "double hash"):
-
Rehash threshold - for free slots instead of full slots, because in general we querying hashes more often, than inserting into. In
index()
operation removed slots act like full ones. -
Explicit
removed
counter and hard-coded "auto-compaction" clausesize < removed || removed > free
instead of removes countdown heuristic. -
modCount
field (thus, "honest" fail-fast).
HashFunctions
(now Primitives
): removed floating-point "salt" rehash, because it is targeting tables with power-of-two capacity.
TPrimitiveHash
(now StatesDHash
) and ObjectHash
(ObjDHash
):
-
index()
andinsert()
methods: reordered branches, where makes sense, and some other appempts to outwit HotSpot (failed :) Also, optimized case when we know there are noremoved
slots currently in the hash. -
removed
consumeFreeSlot
field in favor of 1) calling abstractinsertAt
method, leading to one boxing-unboxing in maps with primitive values; 2) returning a tuple of index and state, with hope in Escape Analysis. Haven't tested performance difference yet.
Leaf Set and Map implementations:
-
rehash()
doesn't check element equality even by reference, assuming providedequals()
andhashCode()
key/element implementations are good. -
All bulk methods are plain old loops over array of states or key objects, no iterators, no
forEach()
. Should be faster. Additionaly, fast-fail withmodCount
, everywhere. -
A kind of double dispatch:
addAllTo()
,allContainingIn()
, etc. - because implementations know better, how to iterate over themselves.
JMH benchmarks
To be honest, I don't believe to exising benchmarks, they meausure index, insert, rehash simultaneously.
Implemented only 2 benchmarks for now:
- Iteration of any kind: for-each loop (
TIterable
is alsoj.u.Iterable
now!), the old way,forEach()
with "lambda", and idiomatictryAdvance()
from Java 8. By the way,IntSet
with load factor 0.5:
nsec/iter forEachFunction 6 forEachLoop 18 simple 8.5 tryAdvance 8.5
index()
, i. e.contains()
, of random present / random absent elements, load factor 0.5 / 0.8.
See JMH_Benchmarks/README.txt
.
Performance comparsion with Trove 3.0.3
-
Iteration over hashes: approx. +30 %, there was one relatively seriuos inefficiency in the previous implementation.
-
Indexing (
get()
orcountains()
): -5% - +10%. Of cause, in hashes of non-trivial objects, even Strings,equals()
andhashCode()
costs outbids these effects many times. -
Bulk methods and
rehash()
should be significantly faster, but haven't measured.
Introduced Issues
"Hard" generics: now all methods necessarily thows ClassCastException on acepting not an instance of subclass of it's generic type. Althogth it is allowed by j.u.Collection and Map contracts, not very good.
In the last commit in my repo (https://bitbucket.org/leventov/trove/commits/all) I dropped Serializable implementations and everything except hashes: lists, stacks, queues, because I won't have any time for developing Trove
soon, and personally don't need them right now. There are only hashes now, on the other hand, they are compiling and passing tests.
That is why:
-
I suggest everybody to participate in testing the new hashes. Remember, they are under Java Collections frameworks, almost drop-in replacement of
HashMap<Integer, String>
, for example. -
I suggest everybody to discuss my decisions. Many of them are arguable.
-
Lets outwork the best collections-of-primitives library!
See also a little TODO I've outlined (however, don't hope much anyone will implement something from it :(
Comments (3)
-
reporter -
reporter - edited description
-
- changed status to wontfix
While there are many good ideas in @leventov's work, the API changes are going to be too great to call it Trove. After discussion and since it is essentially starting from scratch, @leventov has moved the work to a new project. For anyone interested, you can find it here: https://github.com/OpenHFT/UntitledCollectionsProject
- Log in to comment