- changed milestone to 0.6.0
Problematic _BindParamClause equality checks
sql.test_select's currently fails on some Jython platforms (only seen it on 32 bit Linux OpenJDK 6 so far) due to a hash collision of _BindParamClause's during SQLCompiler._truncate_bindparam, at this line
if bindparam in self.bind_names:
return self.bind_names[bindparam](bindparam)
We can have two different bindparam objects with the same hash here, and bindparam1 == bindparam2 always evaluates to True because that operation returns a _BinaryExpression object
So when we ask for a truncated name for _BindParamClause('in4636') and _BindParamClause('in25') is already registered and has the same hash, we get its name instead. The test generates 100k of these objects but apparently that never triggers a hash collision between them on CPython
Seems like one solution would be to add a nonzero to _BinaryExpression
Comments (8)
-
repo owner -
repo owner oh, the
__nonzero__()
returnsFalse
on_Binary
...yeaaa.no. try it and see if you can get 20% of tests to pass. this means a certain class of expression objects would evaluate asFalse
whereas the vast majority of them evaluate asTrue
. it would be more consistent for allClauseElement
to just evaluate asFalse
. but how inconvenient both for the internals and the world at large. -
reporter - assigned issue to
I was thinking of the nonzero fix just so hash and eq would work as expected, but this is definitely a special case. Having hash return id(self) should work then.
Jython uses the default Object.hashCode (System.identityHashCode) for hash. That basically returns the initial 'memory' address of the object, but since it's a moving GC there's the potential for another object to share that same hash
Because of that we have our own annoying id implementation that is basically just an increment, which is unique
-
reporter I guess ideally we'd use an IdentityDict instead, but i'll just go with the hash for now
-
reporter - changed status to resolved
-
repo owner - changed status to open
- removed status
we use hash-based collections for ClauseElements in many places -
__hash__
needs to be applied toClauseElement
. Additionally, I would like the method to be placed only ifjython
is true, so that overhead is not introduced otherwise.This really seems like a jython bug to me in any case, it shouldn't be hard for the default hash of objects to be id(object) just like in cpython.
-
reporter - changed status to resolved
only applying it to jython in 054020a50f9bc082c3fc382be2aacce5c2dbc868
-
repo owner - removed milestone
Removing milestone: 0.6.0 (automated comment)
- Log in to comment
im not seeing here how
__nonzero__
would change things....hash collision, then__eq__()
, thenTrue
in all cases....am I missing something ? when would__nonzero__
be called ?Also, jython doesn't assign hash values based on id(obj) ? What is it doing, running them through a modulus or something ? We rely almost completely on
__hash__()
being distinct for allClauseElement
objects - we put them in all kinds of hashed collections. The reason for that test is because in one case we were sharing the hashes with those of strings, and getting these collisions in cpython with 100ks of objects.So why wouldn't just specifying
__hash__
onClauseElement
as returningid(self)
do it here ?