TLongIntHashMap#containsKey sometimes very slow

Issue #35 invalid
Andrzej Cichocki created an issue

a junit test that demonstrates the problem:

  @Test
  public void containsPerformance() {
    TLongIntHashMap map = new TLongIntHashMap();
    map.setAutoCompactionFactor(0);
    map.ensureCapacity(3700000);
    for (int k = 0; k < 228732; ++k) {
      map.put(k, k);
    }
    for (int i = 0; i < 7830437; ++i) {
      int k = 123456789 + i * 100;
      map.put(k, i);
      map.remove(k);
    }
    for (int lhs = 0, rhs; lhs < 1000000; lhs = rhs) {
      rhs = lhs + 10000;
      long startNanos = System.nanoTime();
      for (int k = lhs; k < rhs; ++k) {
        map.containsKey(k);
      }
      double seconds = (System.nanoTime() - startNanos) / 1e9;
      System.err.println(String.format("[%s, %s) took %.6f seconds.", lhs, rhs, seconds));
      assertTrue(seconds < 1);
    }
  }

output:

[0, 10000) took 0.002281 seconds.
[10000, 20000) took 0.000441 seconds.
[20000, 30000) took 0.000428 seconds.
[30000, 40000) took 0.000426 seconds.
[40000, 50000) took 0.000422 seconds.
[50000, 60000) took 0.000426 seconds.
[60000, 70000) took 0.000426 seconds.
[70000, 80000) took 0.000424 seconds.
[80000, 90000) took 0.000424 seconds.
[90000, 100000) took 0.000423 seconds.
[100000, 110000) took 0.000427 seconds.
[110000, 120000) took 0.000101 seconds.
[120000, 130000) took 0.000078 seconds.
[130000, 140000) took 0.000081 seconds.
[140000, 150000) took 0.000073 seconds.
[150000, 160000) took 0.000076 seconds.
[160000, 170000) took 0.000074 seconds.
[170000, 180000) took 0.000074 seconds.
[180000, 190000) took 0.000081 seconds.
[190000, 200000) took 0.000078 seconds.
[200000, 210000) took 0.000076 seconds.
[210000, 220000) took 0.000077 seconds.
[220000, 230000) took 1.910993 seconds.
  • trove version is 3.0.3, but i believe problem existed in 3.0.1
  • the test is derived from a remote-debugged jvm that had a thread stuck inside containsKey, i hacked at the numbers above until i got a similar map i.e. _free=264, _maxSize=3915350 and _size=228732 with lots of REMOVED states

Comments (4)

  1. Rob Eden

    Well, I don't think this is a bug per se. The problem is that -as you know- you're doing lots of removes and auto-compaction is disabled. This means that you have lots of REMOVED indicators so it's thrashing a lot. If auto-compaction is disabled, you'll want to be sure to do manual compaction periodically. How often you choose to do it should be determined by testing in your environment. The easy way is to do it based on some number of removes using a counter.

    Now, I have some ideas for helping this issue long term. When an element is removed, I'd like to try continuing down the "bucket chain" and pulling the last item up to fill the hole. This would effectively eliminate the problem at the cost of removals being a bit more expensive... but that's just an idea in my head at the moment.

  2. Andrzej Cichocki reporter

    That makes sense, and now I've found the javadoc on compact(). Perhaps it would be a good idea to add a note there on how bad the performance can get. At first I thought I was looking at an infinite loop!

    The reason I've disabled autocompact is that I want to grow the map once (i.e. tune its size to the actual size of incoming data) and then reuse it multiple times, where a reuse cycle involves replacing almost all the data in the map with a similar amount of data, currently implemented by remove many then add many. As I understand it, autocompact would allocate arrays in both the removal and the add phases, which I would like to avoid.

    I think what would work for my use case is some api that can effectively compact in-place? For now I'm going to try clear()ing the map and re-adding the entries I want to keep, instead of many removes.

  3. Rob Eden

    I do very much agree that there needs to be a way to compact without re-allocating backing arrays. Really, that's what compact() should do in my view. We already have trimToSize() for backing array compaction.

  4. Log in to comment