Trove 4.0?

Issue #23 wontfix
Roman Leventov created an issue

During the last month I've entirely rewritten generator of primitive specializations, all Trove basic interfaces, hash maps and hash sets.

Generator

New generator pocesses 100% valid Java code: *Char* and *Char*Short* versions. Now it much more convenient to develop specializations in Intellij. There are also some Ant tasks for separating complete library subset (including also some *Short*, *Char*Char* and *Short*Short* classes) from other generated classes, to work with.

Framework

New classes hierarchy is accurately put into Java Collections framework. Conflicting specializations are named as they are going to be in Java 8: getChar, toCharArray, nextChar in TCharIterator, and so on. Fortunately, in most cases (where a primitive value is passed as an argument) method overloading allows to preserve usual names.

Implemented all new methods of Map, Collection and Iterable iterfaces, introduced in Java 8, except spliterator().

Added new notable "mixin" interface - gnu.trove.TAbstractHash, theoretically allows to abstract double hashes, linear probing hashes, etc, in far-far future, however.

Under the hood

Changes in THash ( renamed to DHash, "double hash"):

  • Rehash threshold - for free slots instead of full slots, because in general we querying hashes more often, than inserting into. In index() operation removed slots act like full ones.

  • Explicit removed counter and hard-coded "auto-compaction" clause size < removed || removed > free instead of removes countdown heuristic.

  • modCount field (thus, "honest" fail-fast).

HashFunctions (now Primitives): removed floating-point "salt" rehash, because it is targeting tables with power-of-two capacity.

TPrimitiveHash (now StatesDHash) and ObjectHash (ObjDHash):

  • index() and insert() methods: reordered branches, where makes sense, and some other appempts to outwit HotSpot (failed :) Also, optimized case when we know there are no removed slots currently in the hash.

  • removed consumeFreeSlot field in favor of 1) calling abstract insertAt method, leading to one boxing-unboxing in maps with primitive values; 2) returning a tuple of index and state, with hope in Escape Analysis. Haven't tested performance difference yet.

Leaf Set and Map implementations:

  • rehash() doesn't check element equality even by reference, assuming provided equals() and hashCode() key/element implementations are good.

  • All bulk methods are plain old loops over array of states or key objects, no iterators, no forEach(). Should be faster. Additionaly, fast-fail with modCount, everywhere.

  • A kind of double dispatch: addAllTo(), allContainingIn(), etc. - because implementations know better, how to iterate over themselves.

JMH benchmarks

To be honest, I don't believe to exising benchmarks, they meausure index, insert, rehash simultaneously.

Implemented only 2 benchmarks for now:

  • Iteration of any kind: for-each loop ( TIterable is also j.u.Iterable now!), the old way, forEach() with "lambda", and idiomatic tryAdvance() from Java 8. By the way, IntSet with load factor 0.5:
                nsec/iter
forEachFunction 6 
forEachLoop     18
simple          8.5
tryAdvance      8.5
  • index(), i. e. contains(), of random present / random absent elements, load factor 0.5 / 0.8.

See JMH_Benchmarks/README.txt.

Performance comparsion with Trove 3.0.3

  • Iteration over hashes: approx. +30 %, there was one relatively seriuos inefficiency in the previous implementation.

  • Indexing (get() or countains()): -5% - +10%. Of cause, in hashes of non-trivial objects, even Strings, equals() and hashCode() costs outbids these effects many times.

  • Bulk methods and rehash() should be significantly faster, but haven't measured.

Introduced Issues

"Hard" generics: now all methods necessarily thows ClassCastException on acepting not an instance of subclass of it's generic type. Althogth it is allowed by j.u.Collection and Map contracts, not very good.


In the last commit in my repo (https://bitbucket.org/leventov/trove/commits/all) I dropped Serializable implementations and everything except hashes: lists, stacks, queues, because I won't have any time for developing Trove soon, and personally don't need them right now. There are only hashes now, on the other hand, they are compiling and passing tests.

That is why:

  • I suggest everybody to participate in testing the new hashes. Remember, they are under Java Collections frameworks, almost drop-in replacement of HashMap<Integer, String>, for example.

  • I suggest everybody to discuss my decisions. Many of them are arguable.

  • Lets outwork the best collections-of-primitives library!

See also a little TODO I've outlined (however, don't hope much anyone will implement something from it :(

Comments (3)

  1. Log in to comment