java.lang.IllegalStateException: No free or removed slots available. Key set full?!!

Issue #42 wontfix
Faber Chri created an issue

Moved from sourceforge.

I get the mentioned IllegalStateException with the following code (on trove 3.0.3):

package ch.uzh;

import gnu.trove.map.TIntIntMap;
import gnu.trove.map.hash.TIntIntHashMap;

public class MainPositive {

        public static void main(String[] args) {
                int NO_ENTRY_CONST = Integer.MAX_VALUE;

                TIntIntMap combinationToClusterIdX = new TIntIntHashMap(
                                Integer.MAX_VALUE, 0.5f, NO_ENTRY_CONST, NO_ENTRY_CONST);

                for (int i = 0; i < Integer.MAX_VALUE; i++) {
                        try {
                                combinationToClusterIdX.put(i, i);
                        } catch (Exception e) {
                                System.out.println("Exception on put of index: " + i);
                                throw e;
                        }
                }
        }
}

And the output is:

vivian% java -cp .:trove-mvn/trove4j-3.0.3.jar ch.uzh.MainPositive
Exception on put of index: 718678400
Exception in thread "main" java.lang.IllegalStateException: No free or removed slots available. Key set full?!!
    at gnu.trove.impl.hash.TIntIntHash.insertKeyRehash(TIntIntHash.java:342)
    at gnu.trove.impl.hash.TIntIntHash.insertKey(TIntIntHash.java:291)
    at gnu.trove.map.hash.TIntIntHashMap.rehash(TIntIntHashMap.java:188)
    at gnu.trove.impl.hash.THash.postInsertHook(THash.java:388)
    at gnu.trove.map.hash.TIntIntHashMap.doPut(TIntIntHashMap.java:222)
    at gnu.trove.map.hash.TIntIntHashMap.put(TIntIntHashMap.java:198)
    at ch.uzh.MainPositive.main(MainPositive.java:16)

Is this the expected behavior?

Comments (8)

  1. Rob Eden

    You're too fast for me. Hadn't had a chance to comment on the mailing list question yet. Sorry. :-)

    I think this is a bug in the fact that it isn't filling up the map.

    That being said, I don't believe this is really a valid test case. The problem is the map is too big. The max theoretical size of the backing array is currently Integer.MAX_VALUE. However, in reality it wants to be a prime bigger than that since that's being specified as the initial value. That's leading to negative numbers during setup and all kinds of badness. So, in real usage you wouldn't want to let your map grow that big for their backing array. (Again, could be made better, but I'm talking about current state.)

    Now, that's just the backing array. For real usage you would want to have far less than that really in use. Your load factor is indicating that only half of the space should be in use. So, you're trying to allocate the backing array to be too big and then trying to stick way too much data in it, though the test is failing quicker than potentially it should.

    The problem is that we've been using double-hashing technique that Johan added when dealing with collisions. It's a good thing for many cases because it deals with clustering better than linear hashing, but it's likely not hitting every element of the array before it loops and decides to bail.

  2. Rob Eden

    I believe this to be a minor issue for the reasons previously explained. If it's causing real world issues though, please let me know and we'll re-prioritize.

  3. Faber Chri reporter

    Hi Rob

    Thanks for your answer and explanations (and sorry for pushing).

    The max theoretical size of the backing array is currently Integer.MAX_VALUE.

    That's what I thought but I was surprised that the boundary was already hit by approx. 700 Mio. entries. I assume that the same is true for TLongXMaps, correct?

    Your load factor is indicating that only half of the space should be in use.

    Does this mean that if I set the load factor to 1.0 I can roughly double the number of elements in the map, but I will never succeed in adding Integer.MAX_VALUEs?

    For real usage you would want to have far less than that really in use.

    What is in your opinion the practically useful maximal size that a TroveMap should have? What are strategies / data structures to deal with numbers of (key, value)-pairs that is bigger than Integer.MAX_VALUE?

    Ok one last question, maybe a stupid one: ints in Java are signed so we have int values from -2^31 to +2^31 - 1, and hence 2 * Integer.Max_Value numbers of possible 32 bit numbers. Shouldn't it be than possible to double the max allowed size of the backing array?

    EDIT: Just found following wikipedia article: http://en.wikipedia.org/wiki/Criticism_of_Java#Large_arrays Max array size of Integer.MAX_VALUE seems to be a Java specific limitation.

    Thx for your help!

    Fabian

  4. Rob Eden

    That's what I thought but I was surprised that the boundary was already hit by approx. 700 Mio. entries. I assume that the same is true for TLongXMaps, correct?

    Yes, all arrays work the same currently.

    Does this mean that if I set the load factor to 1.0 I can roughly double the number of elements in the map, but I will never succeed in adding Integer.MAX_VALUEs?

    Maybe. A load factor of 1.0 wouldn't function very well though.

    What is in your opinion the practically useful maximal size that a TroveMap should have? What are strategies / data structures to deal with numbers of (key, value)-pairs that is bigger than Integer.MAX_VALUE?

    I don't know that there's a hard and fast rule. If you used a better initial size, you might be able to get more into the map (I'm not sure if there's any weirdness exhibited by the initial setup). But, I'd (personally) say if you have more than a couple hundred million, it might be worth looking into splitting your maps. You might be able to fit more, but your performance might be suffering. It's going to be different for every application, so profiling and experimentation is always a good idea.

    The most common way to deal with breaking up data is to have multiple maps partitioned by some piece of data. For example, if you're storing words, maybe there's a map for each starting letter (a-z). Something like that...

    Regarding what you can fit in, keep in mind that there are two factors:

    1. The size of the backing array
    2. Space to deal with bucket collisions since we're using open addressing.

    Ignoring # 2 can result in very bad performance.

  5. Faber Chri reporter

    Yea some kind of a splitting strategy was also my first thought … I guess I will have to dig into this :-((.

    Thanks for your support!

    Best, Fabian

    PS: Project is a hierarchical clustering based recommender system.

  6. Rob Eden

    I do think they should be able to grow larger than they are.

    How big do you think you would want the collections to grow?

  7. Faber Chri reporter

    Well so far I only experimented with reasonably sized dataset (MovieLens 1 Mio). With real life data the collection that makes problems should contain billion of entries.

    I'm afraid that I have to make clear to my boss (one again) that the current concept is not really promising /feasible …

  8. Jim Davies

    This isn't a bug, it's more of an experiment to ascertain the maximum usable size for a Trove map. Whilst it provides useful information, there is nothing here to fix for the time being.

  9. Log in to comment