Problematic _BindParamClause equality checks

Issue #1475 resolved
Philip Jenvey created an issue

sql.test_select's currently fails on some Jython platforms (only seen it on 32 bit Linux OpenJDK 6 so far) due to a hash collision of _BindParamClause's during SQLCompiler._truncate_bindparam, at this line

        if bindparam in self.bind_names:
            return self.bind_names[bindparam](bindparam)

We can have two different bindparam objects with the same hash here, and bindparam1 == bindparam2 always evaluates to True because that operation returns a _BinaryExpression object

So when we ask for a truncated name for _BindParamClause('in4636') and _BindParamClause('in25') is already registered and has the same hash, we get its name instead. The test generates 100k of these objects but apparently that never triggers a hash collision between them on CPython

Seems like one solution would be to add a nonzero to _BinaryExpression

Comments (8)

  1. Mike Bayer repo owner
    • changed milestone to 0.6.0

    im not seeing here how __nonzero__ would change things....hash collision, then __eq__(), then True in all cases....am I missing something ? when would __nonzero__ be called ?

    Also, jython doesn't assign hash values based on id(obj) ? What is it doing, running them through a modulus or something ? We rely almost completely on __hash__() being distinct for all ClauseElement objects - we put them in all kinds of hashed collections. The reason for that test is because in one case we were sharing the hashes with those of strings, and getting these collisions in cpython with 100ks of objects.

    So why wouldn't just specifying __hash__ on ClauseElement as returning id(self) do it here ?

  2. Mike Bayer repo owner

    oh, the __nonzero__() returns False on _Binary...yeaaa.no. try it and see if you can get 20% of tests to pass. this means a certain class of expression objects would evaluate as False whereas the vast majority of them evaluate as True. it would be more consistent for all ClauseElement to just evaluate as False. but how inconvenient both for the internals and the world at large.

  3. Philip Jenvey reporter
    • assigned issue to

    I was thinking of the nonzero fix just so hash and eq would work as expected, but this is definitely a special case. Having hash return id(self) should work then.

    Jython uses the default Object.hashCode (System.identityHashCode) for hash. That basically returns the initial 'memory' address of the object, but since it's a moving GC there's the potential for another object to share that same hash

    Because of that we have our own annoying id implementation that is basically just an increment, which is unique

  4. Philip Jenvey reporter

    I guess ideally we'd use an IdentityDict instead, but i'll just go with the hash for now

  5. Mike Bayer repo owner
    • changed status to open
    • removed status

    we use hash-based collections for ClauseElements in many places - __hash__ needs to be applied to ClauseElement. Additionally, I would like the method to be placed only if jython is true, so that overhead is not introduced otherwise.

    This really seems like a jython bug to me in any case, it shouldn't be hard for the default hash of objects to be id(object) just like in cpython.

  6. Log in to comment