_E_HashSet.template retainAll() hidden side effect

Issue #53 closed
Roman Chernyatchik created an issue

retainAll() methods changes array passed as argument, actually it sorts the array. Such behavior isn't mentioned in javadoc and can lead to hard-to-debug bugs.

/** {@inheritDoc} */
    public boolean retainAll( #e#[] array ) {
        boolean changed = false;
        Arrays.sort( array );
        #e#[] set = _set;
        byte[] states = _states;

        // ...     
        return changed;
    }

Minimal test case:

public class EHashSetTestCase extends TestCase {
  public void testRetainAllSideEffect() throws Exception {
    final int[] data = {5, 4, 3};

    final TIntHashSet seen = new TIntHashSet(new int[]{1, 2, 3});

    // Precondition
    assertEquals("[5, 4, 3]", Arrays.toString(data));

    seen.retainAll(data);

    // Test
    assertEquals("[5, 4, 3]", Arrays.toString(data));
  }
}

Test fails: junit.framework.ComparisonFailure: Expected :[5, 4, 3] Actual :[3, 4, 5]

Comments (7)

  1. Rob Eden

    Huh, you're right. Looks like very old code (was in the original SourceForge CVS repository... so, very old) that's never been noticed before.

  2. jimdavies

    Seems pretty clear cut. I think that modifying the array passed in to the retainAll method is surprising behaviour, so it seems fair to say that it shouldn't happen that way.

    I'll check why the array needs to be modified. If there is a simple mitigation, I'll code that. Otherwise, I'll clone the array, and work with the clone.

  3. jimdavies

    The array of values to retain was being modified because it was being sorted and then binary searched over for each member of the array list. I realised that this was an efficient way to implement this but that, to leave the retention array unmodified, I would need to make a copy of the array. As I was going to have to make a copy of the array anyway, and considering that it was just being checked over for entries existing, I figured that I would simple feed the array into a TIntHashSet, and use that instead of the retention array.

    So the upshot is that use a TIntHashSet, instead of an array for the checking. I'm not completely happy with this, and I would like to do some testing to check that using a hash set is as quick as a binary search, especially when the array is small, but it does deal with duplicates in the retention array quite nicely, so it could be worse.

    It would be good if there was a TIntOrderedSet available, but we don't have that yet. I'll add that in for 4.0 .

    Anyhow, bug fixed.

  4. Log in to comment