result from operation on `array_agg() @> []` assumed to be array

Issue #4063 resolved
Thijs Damsma created an issue

Since sqlalchemy 1.2 the following code will throw a TypeError: 'bool' object is not iterable

I believe this is because the result is assumed to be an array.

This is an example, where I use array_agg and @> as a poor-mans pivot table:

from sqlalchemy import Column, Table, ForeignKey, Integer, create_engine, func
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship
from sqlalchemy.orm.session import sessionmaker

engine = create_engine('postgresql://postgres:abc@localhost:5432/test')
Session = sessionmaker(engine)

Base = declarative_base()


association_table = Table(
    'association', Base.metadata,
    Column('left_id', Integer, ForeignKey('left.id')),
    Column('right_id', Integer, ForeignKey('right.id'))
)


class Parent(Base):
    __tablename__ = 'left'
    id = Column(Integer, primary_key=True)
    children = relationship(
        "Child",
        secondary=association_table,
        back_populates="parents")


class Child(Base):
    __tablename__ = 'right'
    id = Column(Integer, primary_key=True)
    parents = relationship(
        "Parent",
        secondary=association_table,
        back_populates="children")

Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)

session = Session()

children = [Child(), Child(), Child()]
p1 = Parent()
p1.children = [children[0], children[2]]

p2 = Parent()
p2.children = [children[1], children[2]]

session.add_all([p1, p2])

session.commit()

q = session.query(
    Parent.id,
    func.array_agg(Child.id).op('@>')([children[0].id]).label('has_child_0'),
    func.array_agg(Child.id).op('@>')([children[1].id]).label('has_child_1'),
    func.array_agg(Child.id).op('@>')([children[2].id]).label('has_child_2'),
).\
    outerjoin(association_table, Child).\
    group_by(Parent.id)

print(q.all())

Expected result is

[(1, True, False, True), (2, False, True, True)]

Comments (6)

  1. Mike Bayer repo owner

    great, thanks for the test.

    what it's doing now is actually less buggy than what it did before, however, the previous level of bugginess allowed your case to work, whereas now, it reveals a shortcoming in the API.

    The issue fixed in #3964 is why this now occurs, because the default behavior of op() against a particular type is to assume the result object is also of that same type, however in 1.1, the Postgresql dialect failed to get involved with the sqltypes.ARRAY type so it had no result set behavior.

    So here's the first way to make this work right now, use the real contains() operator, which means you need to make it use the postgresql.ARRAY datatype, which is a little verbose (unless you make a helper function):

        from sqlalchemy.dialects.postgresql import ARRAY
        q = session.query(
            Parent.id,
            func.array_agg(Child.id, type_=ARRAY(Integer)).contains([children[0].id]).label('has_child_0'),
            func.array_agg(Child.id, type_=ARRAY(Integer)).contains([children[1].id]).label('has_child_1'),
            func.array_agg(Child.id, type_=ARRAY(Integer)).contains([children[2].id]).label('has_child_2'),
        ).\
            outerjoin(association_table, Child).\
            group_by(Parent.id)
    

    that will work in both versions.

    The way it would be possible with op(), assuming you wanted to use some other PG operator that isn't part of postgresql.ARRAY, you would need to type_coerce the result:

    from sqlalchemy import type_coerce
    from sqlalchemy import Boolean
    
    q = session.query(
        Parent.id,
        type_coerce(func.array_agg(Child.id).op('@>')([children[0].id]), Boolean).label('has_child_0'),
        type_coerce(func.array_agg(Child.id).op('@>')([children[1].id]), Boolean).label('has_child_1'),
        type_coerce(func.array_agg(Child.id).op('@>')([children[2].id]), Boolean).label('has_child_2'),
    ).\
        outerjoin(association_table, Child).\
        group_by(Parent.id)
    

    For both of the above I would suggest building helper functions to reduce the verbosity.

    in the next comment I'll propose new APIs to make op() a little more flexible.

  2. Mike Bayer repo owner

    Well actually there already is API here that should do the "right thing", but as yet it doesn't. You would ensure your op() is a comparison operator:

        q = session.query(
            Parent.id,
            func.array_agg(Child.id).op('@>', is_comparison=True)([children[0].id]).label('has_child_0'),
            func.array_agg(Child.id).op('@>', is_comparison=True)([children[1].id]).label('has_child_1'),
            func.array_agg(Child.id).op('@>', is_comparison=True)([children[2].id]).label('has_child_2'),
        ).\
            outerjoin(association_table, Child).\
            group_by(Parent.id)
    

    that flag is there now, however it won't do everything the docstring says: "if True, the operator will be considered as a "comparison" operator, that is which evaluates to a boolean true/false value" - the flag right now is only used by the ORM inside of relationships. but it tells us the result should be boolean, so we can set that:

    diff --git a/lib/sqlalchemy/sql/default_comparator.py b/lib/sqlalchemy/sql/default_comparator.py
    index 4485c661b..bf3e6e6c4 100644
    --- a/lib/sqlalchemy/sql/default_comparator.py
    +++ b/lib/sqlalchemy/sql/default_comparator.py
    @@ -81,6 +81,15 @@ def _boolean_compare(expr, op, obj, negate=None, reverse=False,
                                     negate=negate, modifiers=kwargs)
    
    
    +def _custom_op_operate(expr, op, obj, reverse=False, result_type=None,
    +                       **kw):
    +    if op.is_comparison and result_type is None:
    +        result_type = type_api.BOOLEANTYPE
    +
    +    return _binary_operate(
    +        expr, op, obj, reverse=reverse, result_type=result_type, **kw)
    +
    +
     def _binary_operate(expr, op, obj, reverse=False, result_type=None,
                         **kw):
         obj = _check_literal(expr, op, obj)
    @@ -249,7 +258,7 @@ operator_lookup = {
         "div": (_binary_operate,),
         "mod": (_binary_operate,),
         "truediv": (_binary_operate,),
    -    "custom_op": (_binary_operate,),
    +    "custom_op": (_custom_op_operate,),
         "json_path_getitem_op": (_binary_operate, ),
         "json_getitem_op": (_binary_operate, ),
         "concat_op": (_binary_operate,),
    
  3. Mike Bayer repo owner

    this is weird, there's already a test for is_comparison returning boolean, but it doesnt test every case:

    (Pdb) column('x').op('$', is_comparison=True)(None).type
    Boolean()
    (Pdb) column('x', Integer).op('$', is_comparison=True)(None).type
    NullType()
    (Pdb) column('x', ARRAY(Integer)).op('$', is_comparison=True)(None).type
    ARRAY(Integer())
    
  4. Thijs Damsma reporter

    Thanks for looking into this. I already found a quick fix that works for me: explicitly cast the result to Boolean. But of course it would be nicer if this al 'just worked'

    Edit: just realized casting was your second suggestion. And probably coercing is better than casring as i presume the type coerce only operates in the python layer and doesn't influence the generated SQL as is that case with cast(..., Boolean)

  5. Mike Bayer repo owner

    Ensure custom ops have consistent typing behavior, boolean support

    Refined the behavior of :meth:.Operators.op such that in all cases, if the :paramref:.Operators.op.is_comparison flag is set to True, the return type of the resulting expression will be :class:.Boolean, and if the flag is False, the return type of the resulting expression will be the same type as that of the left-hand expression, which is the typical default behavior of other operators. Also added a new parameter :paramref:.Operators.op.return_type as well as a helper method :meth:.Operators.bool_op.

    Change-Id: Ifc8553cd4037d741b84b70a9702cbd530f1a9de0 Fixes: #4063

    → <<cset 919b8bc4acf8>>

  6. Log in to comment