Endless loop with `foo in arraycolumn`

Issue #3642 resolved
Adrian created an issue

This example never terminates:

from sqlalchemy import *
from sqlalchemy.dialects.postgresql import ARRAY
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import *


Base = declarative_base()


class Foo(Base):
    __tablename__ = 'foo'

    id = Column(Integer, primary_key=True)
    chain = Column(ARRAY(Integer))


e = create_engine('postgresql:///test', echo=True)
Base.metadata.create_all(e)
s = Session(e)

print s.query(Foo).filter(123 in Foo.chain).all()

Traceback on ^C:

^CTraceback (most recent call last):
  File "satest.py", line 22, in <module>
    print s.query(Foo).filter(123 in Foo.chain).all()
  File "/home/adrian/dev/indico/env/lib/python2.7/site-packages/sqlalchemy/sql/operators.py", line 301, in __eq__
    return self.operate(eq, other)
  File "/home/adrian/dev/indico/env/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 739, in operate
    return op(self.comparator, *other, **kwargs)
  File "/home/adrian/dev/indico/env/lib/python2.7/site-packages/sqlalchemy/sql/operators.py", line 301, in __eq__
    return self.operate(eq, other)
  File "<string>", line 1, in <lambda>
  File "/home/adrian/dev/indico/env/lib/python2.7/site-packages/sqlalchemy/sql/type_api.py", line 60, in operate
    return o[0](self.expr, op, *(other + o[1:]), **kwargs)
  File "/home/adrian/dev/indico/env/lib/python2.7/site-packages/sqlalchemy/sql/default_comparator.py", line 69, in _boolean_compare
    negate=negate, modifiers=kwargs)
  File "/home/adrian/dev/indico/env/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 2734, in __init__
    self.type = type_api.to_instance(type_)
KeyboardInterrupt

From looking at the code __contains__ is not supported, but it should result in an exception and not in a hang.


Also, I think it would be useful to implement __contains__ to call self.contains([value]). Using .any(123) is not an option since it does not make use of a GIN index on the table, while .contains([123]) does even though both do the same thing.

Comments (12)

  1. Mike Bayer repo owner
    • Fixed issue where inadvertent use of the Python __contains__ override with a column expression (e.g. by using 'x' in col) would cause an endless loop in the case of an ARRAY type, as Python defers this to __getitem__ access which never raises for this type. Overall, all use of __contains__ now raises NotImplementedError. fixes #3642

    → <<cset e0a580b3d055>>

  2. Mike Bayer repo owner
    • Fixed issue where inadvertent use of the Python __contains__ override with a column expression (e.g. by using 'x' in col) would cause an endless loop in the case of an ARRAY type, as Python defers this to __getitem__ access which never raises for this type. Overall, all use of __contains__ now raises NotImplementedError. fixes #3642

    (cherry picked from commit e0a580b3d055a600afae61840058a5a30ef5fe74)

    → <<cset 69f6a8d714c0>>

  3. Mike Bayer repo owner

    thanks for reporting! not sure I agree with "contains -> contains()", since the keyword used is "in", which people are reallly going to think is SQL IN. The in/contains mismatch across SQL / Python is a good reason to leave this area explicitly not supported.

  4. Adrian reporter

    Hm, assuming there are no cases where an actual SQL foo = ANY(..) is needed instead of .. @> {foo} and the functionality is exactly the same, would it be possible to emit the latter in case of any()?

  5. Mike Bayer repo owner

    As I don't really use ARRAY types myself, I'm sure I'm not understanding. any() right now emits the operator "ANY" (and in 1.1 has been enhanced to also work with MySQL's ANY operator). If someone wants "@>", they use "contains". Why would we remove support entirely for one of these operators and make the two functions emit the same thing?

  6. Adrian reporter

    123 = ANY(arraycol) checks if 123 is in the array stored in arraycol (like python's in operator).
    arraycol @> '{123}' does the same thing, but it can make use of a GIN-type index, making it more efficient.

    That's why I'm suggesting to always use the more efficient operator for this (assuming there are really no differences besides the index usage)

  7. Mike Bayer repo owner

    Then why does Postgresql have ANY ? Why can't they run this optimization on their end? Isn't this a bug in Postgresql ?

  8. Adrian reporter

    ANY by itself is more standardized (it's not just for arrays) - I think you can do somecolumn = ANY (SELECT ...) etc.

    I guess that's a usecase where SQL ANY is actually needed, but for "simple" argument types (i.e. not queries) you could still go for the more efficient @> operator

  9. Mike Bayer repo owner

    yeah we dont really make decisions like that because it's just a surprise for the user who is doing somethign where this suddenly doesn't work. The SQL Core remains as 1-1 mapped to SQL as possible.

  10. Log in to comment