THashMap slower than HashMap

Issue #14 new
Rob Eden created an issue

Originally created by @parentjo (2011-02-26)

Results from caliper benchmarks show that THashMap is significantly slower than the JDK HashMap. Below Trove3Generic lines refer to THashMap, Trove3 refers to TIntObjectHashMap

size benchmark us logarithmic runtime
1000 JavaPut 22,0 ===
1000 Trove3GenericJavaPut 83,6 =============
1000 ColtPut 41,5 ========
1000 Trove2Put 50,2 =========
1000 Trove3Put 38,0 =======
1000 JavaGet 23,1 ===
1000 Trove3GenericJavaGet 75,3 ============
1000 ColtGet 34,1 ======
1000 Trove2Get 28,6 =====
1000 Trove3Get 22,5 ===
1000 JavaContainsKey 23,1 ===
1000 Trove3GenericContainsKey 19,0 ==
1000 ColtContainsKey 21,8 ===
1000 Trove2ContainsKey 17,8 ==
1000 Trove3ContainsKey 15,3 =
10000 JavaPut 336,8 =======================
10000 Trove3GenericJavaPut 874,4 ==============================
10000 ColtPut 447,4 =========================
10000 Trove2Put 508,8 ==========================
10000 Trove3Put 412,7 ========================
10000 JavaGet 424,6 ========================
10000 Trove3GenericJavaGet 776,4 =============================
10000 ColtGet 296,9 ======================
10000 Trove2Get 265,9 =====================
10000 Trove3Get 222,6 ====================
10000 JavaContainsKey 420,7 ========================
10000 Trove3GenericContainsKey 206,2 ===================
10000 ColtContainsKey 218,2 ====================
10000 Trove2ContainsKey 176,2 ==================
10000 Trove3ContainsKey 148,9 =================

Original SF bug

Comments from original bug:

@parentjo (2011-12-03)

Here's a long overdue update of the results. A comparison with Colt and Trove2 has been omitted.

Progress is good with THashMap clearly outperformig HashMap once the number of entries increases. For small Map-s with 100 entries difference are not significant.

IMPORTANT NOTE: some iteration related improvements will only be available after 3.0.2 (or later...)

size benchmark us logarithmic runtime
100 MapJavaPut 2,56 ===
100 MapTrove3ObjectPut 2,74 ===
100 MapJavaGet 1,54 =
100 MapTrove3ObjectGet 1,55 =
100 MapJavaContainsKey 1,54 =
100 MapTrove3ObjectContainsKey 1,31 =
100 MapJavaKeySetContains 1,55 =
100 MapTrove3ObjectKeySetContains 1,48 =
100 MapJavaEntrySetIterate 1,89 ==
100 MapTrove3ObjectEntrySetIterate 1,93 ==
1000 MapJavaPut 27,90 ===========
1000 MapTrove3ObjectPut 27,58 ===========
1000 MapJavaGet 16,46 =========
1000 MapTrove3ObjectGet 15,39 =========
1000 MapJavaContainsKey 16,42 =========
1000 MapTrove3ObjectContainsKey 13,05 ========
1000 MapJavaKeySetContains 16,65 =========
1000 MapTrove3ObjectKeySetContains 14,67 =========
1000 MapJavaEntrySetIterate 17,59 =========
1000 MapTrove3ObjectEntrySetIterate 13,87 =========
10000 MapJavaPut 491,21 =====================
10000 MapTrove3ObjectPut 297,11 ===================
10000 MapJavaGet 191,38 ==================
10000 MapTrove3ObjectGet 156,55 =================
10000 MapJavaContainsKey 190,46 ==================
10000 MapTrove3ObjectContainsKey 131,88 ================
10000 MapJavaKeySetContains 195,69 ==================
10000 MapTrove3ObjectKeySetContains 148,20 =================
10000 MapJavaEntrySetIterate 163,85 =================
10000 MapTrove3ObjectEntrySetIterate 141,60 =================
100000 MapJavaPut 6047,75 =============================
100000 MapTrove3ObjectPut 3364,16 ===========================
100000 MapJavaGet 4257,41 ============================
100000 MapTrove3ObjectGet 1864,38 =========================
100000 MapJavaContainsKey 4317,77 ============================
100000 MapTrove3ObjectContainsKey 1787,84 =========================
100000 MapJavaKeySetContains 4254,04 ============================
100000 MapTrove3ObjectKeySetContains 1814,91 =========================
100000 MapJavaEntrySetIterate 2715,44 ===========================
100000 MapTrove3ObjectEntrySetIterate 1609,45 =========================

Comments (5)

  1. Wolfgang Kronberg Account Deactivated

    We did some internal testing of not-too-large collections, comparing TIntObjectHashMap with a naive implementation change of Oracle's original HashMap, simply replacing the key objects with ints. The modified Oracle version proved to be much faster on our machines than the trove implementation.

    The reason for that strange behaviour is that TIntHash.index() (and similar methods) make use of the modulo operator, such as:

    index = hash % length;

    On many machnes (including ours) the modulo operation is so slow that it dominates the result in this test. Oracle's implementation uses array sizes matching a power of two, so they do not need that expensive operator. Trove, on the other hand, always uses primes as array sizes and has good reasons to do so, so there is no easy remedy.

  2. Alex Hayward

    Interesting to know.

    Was the benchmark running in a tight loop doing operations on the not-too-large collection? From a cursory search, it looks like a DIV is still faster than a main memory fetch, and maybe similar to accessing something from a cache in another CPU package. Given the memory use difference, I'd have thought it'd be very sensitive to whether your test runs entirely in CPU cache or not and whether you're using multiple threads and CPU packages or not. Not to mention that in a real-life scenario a map which accesses more memory may evict things from cache that slow down your /other/ code.

    Maybe the trove version would do a lot better (relatively) than your test suggests if your access to the map is distributed throughout code which is doing a lot of other memory access?

  3. Boudewijn van Dongen

    We've experienced the same in many performance tests with datastructures using arrays. The simple way of avoiding this problem is to make sure that length is a power of 2, in which case modulo and division become bitwise operators. Now Java does not recognize this at runtime, so you need to really replace the code for modulo and division by bitwise operators. It makes for unreadable code in many cases.

    The great disadvantage of course is that you are forced to certain lengths of the internal array, which is undesirable. To solve this, you could split the array into a 2 dimensional one, where each component uses a power of 2 length.

  4. Rob Eden reporter

    I agree that we need to get to powers of 2 to see significant improvement.

    I feel like this logic is something that may need to be pulled out and made pluggable as there are tradeoffs for each approach. We'd need to do a bit of research to see how much cost would be incurred by making things pluggable, but hopefully the cost would be minimal.

  5. Log in to comment