Postgres ARRAY indices are being turned into bound params unless they're wrapped in db.text()

Issue #3615 resolved
Jeff Widman created an issue

I'm accessing a Postgres ARRAY, and I'm hitting behavior that I suspect is a bug. I'm not sure whether the problem is with SQLAlchemy or psycopg2.

Whenever I try to pass an array index, it gets turned into a bound param (causing an error) unless I wrap the integer in db.text().

For example, see the SQLAlchemy query in this blog post: http://www.jeffwidman.com/blog/881/building-a-postgresql-recursive-cte-in-sqlalchemy/

There are three lines that include db.text(), and in each case, I think the query should work by just passing a normal integer. Above each line that uses db.text() is a commented out version of what I think the correct syntax should be, followed by what the working version is.

This isn't blocking me in the least, so it's not a big deal. I just noticed it, and decided to report because I really appreciate the code quality of SQLAlchemy.

If you need info for debugging, let me know. I'm happy to help, although not sure I understand the intricacies of SQLAlchemy well enough to submit a PR unless the problem is fairly simple.

Thanks again for all your hard work on this library.

Comments (10)

  1. Mike Bayer repo owner

    so, array indexes are literal so must be sent as bound parameters unless there are definite driver issues. We obviously have a wide variety of tests which exercise this, for example:

    py.test test/dialect/postgresql/test_types.py  --db postgresql -k test_array_index_slice_exprs -s --log-debug=sqlalchemy.engine
    

    here's some segments of the debug output (this is round trips through psycopg2):

    #!
    
    INFO:sqlalchemy.engine.base.Engine:SELECT (ARRAY[%(param_1)s, %(param_2)s, %(param_3)s, %(param_4)s])[%(param_5)s:%(param_6)s] AS anon_1
    INFO:sqlalchemy.engine.base.Engine:{'param_5': 2, 'param_4': 4, 'param_6': 3, 'param_1': 1, 'param_3': 3, 'param_2': 2}
    DEBUG:sqlalchemy.engine.base.Engine:Col ('anon_1',)
    DEBUG:sqlalchemy.engine.base.Engine:Row ([2, 3],)
    INFO:sqlalchemy.engine.base.Engine:SELECT (ARRAY[%(param_1)s, %(param_2)s, %(param_3)s, %(param_4)s])[%(param_5)s] AS anon_1
    INFO:sqlalchemy.engine.base.Engine:{'param_5': 2, 'param_4': 4, 'param_1': 1, 'param_3': 3, 'param_2': 2}
    DEBUG:sqlalchemy.engine.base.Engine:Col ('anon_1',)
    DEBUG:sqlalchemy.engine.base.Engine:Row (2,)
    INFO:sqlalchemy.engine.base.Engine:SELECT (ARRAY[%(param_1)s, %(param_2)s] || ARRAY[%(param_3)s, %(param_4)s])[%(param_5)s:%(param_6)s] AS anon_1
    INFO:sqlalchemy.engine.base.Engine:{'param_5': 2, 'param_4': 4, 'param_6': 3, 'param_1': 1, 'param_3': 3, 'param_2': 2}
    DEBUG:sqlalchemy.engine.base.Engine:Col ('anon_1',)
    DEBUG:sqlalchemy.engine.base.Engine:Row ([2, 3],)
    INFO:sqlalchemy.engine.base.Engine:SELECT ARRAY[%(param_1)s, %(param_2)s] || (ARRAY[%(param_3)s, %(param_4)s])[%(param_5)s:%(param_6)s] AS anon_1
    INFO:sqlalchemy.engine.base.Engine:{'param_5': 2, 'param_4': 4, 'param_6': 3, 'param_1': 1, 'param_3': 3, 'param_2': 2}
    DEBUG:sqlalchemy.engine.base.Engine:Col ('anon_1',)
    DEBUG:sqlalchemy.engine.base.Engine:Row ([1, 2, 4],)
    INFO:sqlalchemy.engine.base.Engine:SELECT (ARRAY[%(param_1)s, %(param_2)s])[%(param_3)s:%(param_4)s] || ARRAY[%(param_5)s, %(param_6)s] AS anon_1
    INFO:sqlalchemy.engine.base.Engine:{'param_5': 3, 'param_4': 3, 'param_6': 4, 'param_1': 1, 'param_3': 2, 'param_2': 2}
    DEBUG:sqlalchemy.engine.base.Engine:Col ('anon_1',)
    DEBUG:sqlalchemy.engine.base.Engine:Row ([2, 3, 4],)
    INFO:sqlalchemy.engine.base.Engine:SELECT (array_cat(ARRAY[%(param_1)s, %(param_2)s, %(param_3)s], ARRAY[%(param_4)s, %(param_5)s, %(param_6)s]))[%(param_7)s:%(param_8)s] AS anon_1
    INFO:sqlalchemy.engine.base.Engine:{'param_8': 5, 'param_5': 5, 'param_4': 4, 'param_7': 2, 'param_6': 6, 'param_1': 1, 'param_3': 3, 'param_2': 2}
    

    at the very least, when you say something like "causing an error", always put what error you're getting! thanks

  2. Jeff Widman reporter

    My apologies--I can't believe I forgot the error message.

    So I played around with this, and it appears the error isn't showing up when I run the queries straight, but it is showing up when I run them as part of my materialized view creation recipe.

    Here's a standalone script that allows replicating the errors I'm seeing. To make it work, there are three lines where I used db.text() to inline a parameter before passing it to the db... but if I comment those out and instead do a normal sqlalchemy query that passes the parameters as bound parameters, it throws errors.

    The exception is KeyError: 'param_1' for the first line and KeyError: 'path_1' for the second two and seems to be caused by the bound parameters getting removed before the query is sent to the db.

    Specifically, see line #'s 91, 109, and 111:

    # postgresql_array_errors.py
    
    import sqlalchemy as db
    from sqlalchemy import event
    from sqlalchemy.ext import compiler
    from sqlalchemy.ext.declarative import declarative_base
    from sqlalchemy.dialects import postgresql
    from sqlalchemy.schema import DDLElement
    
    Base = declarative_base()
    
    
    class CreateMaterializedView(DDLElement):
        def __init__(self, name, selectable):
            self.name = name
            self.selectable = selectable
    
    
    @compiler.compiles(CreateMaterializedView)
    def compile(element, compiler, **kw):
        # Could use "CREATE OR REPLACE MATERIALIZED VIEW..."
        # but I'd rather have noisy errors
        return "CREATE MATERIALIZED VIEW %s AS %s" % (
            element.name,
            compiler.sql_compiler.process(element.selectable))
    
    
    def create_mat_view(metadata, name, selectable):
        mt = db.MetaData()
        t = db.Table(name, mt)
        for c in selectable.c:
            t.append_column(db.Column(c.name, c.type, primary_key=c.primary_key))
    
        event.listen(
            metadata, "after_create",
            CreateMaterializedView(name, selectable)
        )
    
        @event.listens_for(metadata, "after_create")
        def create_indexes(target, connection, **kw):
            for idx in t.indexes:
                idx.create(connection)
    
        event.listen(
            metadata, "before_drop",
            db.DDL('DROP MATERIALIZED VIEW IF EXISTS ' + name)
        )
        return t
    
    
    
    class GearCategory(Base):
        __tablename__ = 'gear_category'
        id = db.Column(db.Integer, primary_key=True)
        name = db.Column(db.Text, unique=True, nullable=False) #auto-indexed since unique
        parent_id = db.Column(db.Integer, db.ForeignKey('gear_category.id'),
                index=True)
        # This adds both 'children' and 'parent' object references to each category
        children = db.orm.relationship('GearCategory', backref=db.orm.backref('parent',
                remote_side=[id]), lazy='dynamic', order_by='GearCategory.name')
        gear_items = db.orm.relationship('GearItem', backref='category',
                lazy='dynamic', order_by='GearItem.name')
    
    
    class GearItem(Base):
        __tablename__ = 'gear_item'
        id = db.Column(db.Integer, primary_key=True)
        name = db.Column(db.Text, nullable=False, index=True)
        category_id = db.Column(db.Integer, db.ForeignKey('gear_category.id'),
                nullable=False, index=True)
    
    
    class GearCategoryMV(Base):
        '''a materialized view built using a recursive CTE'''
        _tree_cte = db.select([
                        GearCategory.id, GearCategory.parent_id,
                        postgresql.array([GearCategory.id]).label('path')
                        ]).where(GearCategory.parent_id==None
                        ).cte(name='tree', recursive=True)
        _tree_cte = _tree_cte.union_all(db.select([
                        GearCategory.id, GearCategory.parent_id,
                        (_tree_cte.c.path + postgresql.array([GearCategory.id])
                                                                ).label('path'),
                        ]).select_from(db.join(_tree_cte, GearCategory,
                            _tree_cte.c.id==GearCategory.parent_id)))
        # print(db.select([_tree_cte]).compile(dialect=postgresql.dialect())) # DEBUG
        _direct_count = db.select([GearItem.category_id.label('id'),
                                db.func.count(GearItem.id).label('item_ct'),
                        ]).group_by(GearItem.category_id).alias('direct_count')
        _tree_count_cte = db.select([_tree_cte.c.id, _tree_cte.c.path,
                        db.func.coalesce(_direct_count.c.item_ct, 0
                        # db.func.coalesce(_direct_count.c.item_ct, db.text('0')
                                                            ).label('item_ct'),
                        ]).select_from(
                            db.join(_tree_cte, _direct_count,
                                    _tree_cte.c.id==_direct_count.c.id, isouter=True)
                        ).cte(name='tree_ct')
        # print(db.select([_tree_count_cte]).compile(dialect=postgresql.dialect())) # DEBUG
        _aliased_tree_count_cte = _tree_count_cte.alias('t1')
    
        gc_mv_query = db.select([
                        _tree_count_cte.c.id.label('id'),
                        _tree_count_cte.c.item_ct.label('count_direct_child_items'),
                        db.func.sum(_aliased_tree_count_cte.c.item_ct
                                        ).label('count_recursive_child_items'),
                        ]).select_from(
                            db.join(_tree_count_cte, _aliased_tree_count_cte,
                                _tree_count_cte.c.path==
                                    _aliased_tree_count_cte.c.path[1:
                                    # _aliased_tree_count_cte.c.path[db.text('1'):
                                    db.func.array_upper(_tree_count_cte.c.path, 1)],
                                    # db.func.array_upper(_tree_count_cte.c.path, db.text('1'))],
                                isouter=True)
                        ).group_by(_tree_count_cte.c.id, _tree_count_cte.c.item_ct)
        # print(gc_mv_query.compile(dialect=postgresql.dialect())) # DEBUG
    
        __table__ = create_mat_view(
            Base.metadata,
            "gear_mv",
            gc_mv_query
            )
    
    
    e = db.create_engine("postgresql://scott:tiger@localhost/test", echo=True) # Mike's DB
    # e = db.create_engine("postgresql://rc_com@localhost/rc_test", echo=True) # Jeff's DB
    Base.metadata.drop_all(e)
    Base.metadata.create_all(e)
    
  3. Mike Bayer repo owner

    OK that's a crapload of code and I see you are using CTEs, are the CTE's necessary to create the error (and can you paste the actual stacktrace please?) this looks like #3090 so far.

  4. Mike Bayer repo owner

    also can you possibly remove as much code as possible here? you need all three ctes and that enormous select and all that? this is a lot of code to wade through and I'd have a much easier time working with a simple reproduction case.

  5. Jeff Widman reporter

    Haven't forgotten about this, just haven't had time yet. I will try to get you a simpler repro case.

  6. Mike Bayer repo owner

    this only requires:

    @compiler.compiles(CreateMaterializedView)
    def compile(element, compiler, **kw):
        # Could use "CREATE OR REPLACE MATERIALIZED VIEW..."
        # but I'd rather have noisy errors
        return "CREATE MATERIALIZED VIEW %s AS %s" % (
            element.name,
            compiler.sql_compiler.process(element.selectable, literal_binds=True))
    

    doc clarification coming

  7. Jeff Widman reporter

    Thanks!

    I hadn't forgotten, it was still on my todo list.

    I have updated my blog post as well.

  8. Log in to comment