Allow to inject expression-based columns into Query objects / Cls.attribute.with_expression(x)

Issue #3058 resolved
Henrique Alves created an issue

I'm trying to do something that, in principle, should be simple: inject a expression-based column into an ORM-based Query. I've tried this:

query = db.session.query(MyClass).all()
# Using `now` just as an example, ideally it's something that takes a parameter
# so it can't be replaced by `orm.column_property`
some_expression = func.now() 
selectable = query.selectable
# The following works as expected, injecting the expression as a column
# the AnnotatedSelect object...
selectable.append_column(some_expression.label('some_column'))
# ... but I can't set it back on the query
query.selectable = selectable
query = query.order_by('some_column')

It fails with AttributeError since query.selectable is read-only. I haven't found another way to inject an expression by reading the docs or the source.

  • Query.add_column isn't quite the same thing since you now get a KeyedTuple back, not a MyClass object with the injected attribute.

  • orm.column_property is of limited usefulness since you can't pass arguments only known at runtime instead of class-evaluation time.

I thought SQLAlchemy made things like this a breeze, but I've spent days without finding an adequate solution. Either I'm missing something, or an API for this is badly needed.

Official response

  • Mike Bayer repo owner

    here's an extremely quick and simple way to get any expression anywhere you want:

    diff --git a/lib/sqlalchemy/orm/strategies.py b/lib/sqlalchemy/orm/strategies.py
    index 370cb97..364daa4 100644
    --- a/lib/sqlalchemy/orm/strategies.py
    +++ b/lib/sqlalchemy/orm/strategies.py
    @@ -16,6 +16,9 @@ from . import (
         attributes, interfaces, exc as orm_exc, loading,
         unitofwork, util as orm_util
     )
    +from ..sql import expression
    +from .util import _orm_full_deannotate
    +
     from .state import InstanceState
     from .util import _none_set
     from . import properties
    @@ -142,12 +145,20 @@ class ColumnLoader(LoaderStrategy):
                 self, context, entity, path, loadopt,
                 adapter, column_collection, memoized_populators, **kwargs):
    
    -        for c in self.columns:
    +        if loadopt and "expression" in loadopt.local_opts:
    +            columns = [
    +                expression._labeled(
    +                    _orm_full_deannotate(loadopt.local_opts["expression"]))
    +            ]
    +        else:
    +            columns = self.columns
    +
    +        for c in columns:
                 if adapter:
                     c = adapter.columns[c]
                 column_collection.append(c)
    
    -        fetch = self.columns[0]
    +        fetch = columns[0]
             if adapter:
                 fetch = adapter.columns[fetch]
             memoized_populators[self.parent_property] = fetch
    diff --git a/lib/sqlalchemy/orm/strategy_options.py b/lib/sqlalchemy/orm/strategy_options.py
    index 796c17e..b075f92 100644
    --- a/lib/sqlalchemy/orm/strategy_options.py
    +++ b/lib/sqlalchemy/orm/strategy_options.py
    @@ -973,7 +973,7 @@ def defer(key, *addl_attrs):
    
    
     @loader_option()
    -def undefer(loadopt, key):
    +def undefer(loadopt, key, expression=None):
         """Indicate that the given column-oriented attribute should be undeferred,
         e.g. specified within the SELECT statement of the entity as a whole.
    
    @@ -1007,16 +1007,18 @@ def undefer(loadopt, key):
             :func:`.orm.undefer_group`
    
         """
    -    return loadopt.set_column_strategy(
    +    loader = loadopt.set_column_strategy(
             (key, ),
    -        {"deferred": False, "instrument": True}
    +        {"deferred": False, "instrument": True},
    +        opts={"expression": expression} if expression is not None else None
         )
    +    return loader
    
    
     @undefer._add_unbound_fn
    -def undefer(key, *addl_attrs):
    +def undefer(key, *addl_attrs, **kw):
         return _UnboundLoad._from_keys(
    -        _UnboundLoad.undefer, (key, ) + addl_attrs, False, {})
    +        _UnboundLoad.undefer, (key, ) + addl_attrs, False, kw)
    
    
     @loader_option()
    

    this allows the "undefer()" option to be used on any column anywhere with an arbitrary expression:

    from sqlalchemy import *
    from sqlalchemy.orm import *
    from sqlalchemy.ext.declarative import declarative_base
    
    Base = declarative_base()
    
    
    class A(Base):
        __tablename__ = 'a'
        id = Column(Integer, primary_key=True)
        x = Column(Integer)
        y = Column(Integer)
    
        # placeholder, yields NULL usually...
        expr = deferred(null())
    
    e = create_engine("sqlite://", echo=True)
    Base.metadata.create_all(e)
    
    s = Session(e)
    
    s.add(A(x=1, y=2))
    s.commit()
    
    # but let's give it x + y
    a1 = s.query(A).options(undefer(A.expr, expression=A.x + A.y)).first()
    print a1.expr
    
    s.expire_all()
    
    # works on *any* column!
    a1 = s.query(A).options(undefer(A.x, expression=literal(17))).first()
    print a1.x
    

    now if this works with all the crazy inheritance cases and such, would need to be tested.

Comments (35)

  1. Henrique Alves reporter

    After raising the issue on IRC, I've found an adequate solution by using column_property together with bindparam.

    class MyClass(DeclarativeBase):
        my_column = column_property(some_sql_expression(bindparam('my_param', value='', type_=String)))
    
    db.session.query(MyClass).all().params(my_param='foo')
    

    This allows to have computed columns that depend on parameters at query time without falling back to KeyedTuples, playing nice with JSON serialization, etc.

    I still don't know how that interacts with the rest of SQLAlchemy or if it's even supposed to be supported, would like to know about it.

  2. Mike Bayer repo owner

    suppose I had this query:

    sess.query(func.now(), User, Address).select_from(User).join(User.addresses)
    

    What exactly would this hypothetical "append expression, but not to the tuple" feature look like, and what would it do? How could it be presented both in the API and in documentation that it is not confusing as to its purpose vs. that of add_column(), column_property(), etc.?

  3. Mike Bayer repo owner

    Furthermore. If our User class had an existing column, or an attribute, called "name", and we then used this new option, also featuring the word "name", should it overwrite the value of the original "name" column? What if our User object is already present in the Session identity map, and already has a loaded value for "name", and even further what if that column has pending changes on it that are unflushed? Do we ignore it, overwrite the value, etc. ? I'm of course trying to illustrate that what may seem like a "simple" idea to the new user of SQLAlchemy is not at all clear cut once the full functional paradigm of the ORM (e.g. unit of work, identity map features) are taken into account.

  4. Henrique Alves reporter

    @zzzeek Right, I see the problem.

    It seems that the best option to have dynamic columns is to do as I did - create a column_property and provide a value to a bindparam at query time.

    Is that supported, or do you see any obvious problems? If so, I'll make a recipe for the docs.

  5. Mike Bayer repo owner

    i agree that pre-declaring something on the mapped class is an easy way to do this. We can support it more fully, as a true "placeholder" that can be targeted more accurately. Most appropriately it would be like a loader option:

    class User(Base):
        # ...
    
        special_info = deferred_expression()
    
    s = sess.query(User, Address).select_from(User).join(User.addresses).
             options(with_expression(table.c.column, User.special_info))
    
  6. Mike Bayer repo owner

    or better yet:

    sess.query(User, Address).options(User.special_info.with_expression(table.c.column))
    
  7. Henrique Alves reporter

    I haven't tested yet if the following works, but would that proposed API be equivalent to this?

    class User(Base):
        special_info = deferred(column_property(bindparam('foo', type_=String)))
    
    sess.query(User).options(undefer(User.special_info)).params(foo='bar')
    

    And if so, what's the problem with this? Name clashes on params?

  8. Mike Bayer repo owner

    There's nothing wrong with how you're doing it there, you can stick with that. It's just for more of an "official" API we'd do something that doesn't rely on arbitrary string names / params / stuff like that. The string name requirement would be awkward in a large scale model where one has to make sure to name everything something unique.

  9. Lukas Siemon

    I'm also very interested in this. When do you think it will come?

    Note that this issue has been moved around quite a bit since I started watching it.

  10. Mike Bayer repo owner

    ah is this because I just pushed out the milestone? this is a pretty outlier "nice to have" kind of thing that is until now never really asked for, do you have any interest in working on this?

  11. Lukas Siemon

    Yeah, I think that's what is happening. I kind of assumed it was planned for the release it was tagged in.

    We are currently making very heavy use of this feature. The "problem" we have is that returned column properties get cached and are not updated when the bindparam change. So we need to invalidate the data before we re-fetch.

    Certainly not opposed to it, but I'm not entirely sure what the proposed work outline is here. For us it would be "enough" if the column properties get invalidated when the passed param change.

  12. Mike Bayer repo owner

    I can whip up a quick version of "the feature" in ten minutes. it's writing good tests as well as enhancement documentation that's all the work.

  13. Mike Bayer repo owner

    here's an extremely quick and simple way to get any expression anywhere you want:

    diff --git a/lib/sqlalchemy/orm/strategies.py b/lib/sqlalchemy/orm/strategies.py
    index 370cb97..364daa4 100644
    --- a/lib/sqlalchemy/orm/strategies.py
    +++ b/lib/sqlalchemy/orm/strategies.py
    @@ -16,6 +16,9 @@ from . import (
         attributes, interfaces, exc as orm_exc, loading,
         unitofwork, util as orm_util
     )
    +from ..sql import expression
    +from .util import _orm_full_deannotate
    +
     from .state import InstanceState
     from .util import _none_set
     from . import properties
    @@ -142,12 +145,20 @@ class ColumnLoader(LoaderStrategy):
                 self, context, entity, path, loadopt,
                 adapter, column_collection, memoized_populators, **kwargs):
    
    -        for c in self.columns:
    +        if loadopt and "expression" in loadopt.local_opts:
    +            columns = [
    +                expression._labeled(
    +                    _orm_full_deannotate(loadopt.local_opts["expression"]))
    +            ]
    +        else:
    +            columns = self.columns
    +
    +        for c in columns:
                 if adapter:
                     c = adapter.columns[c]
                 column_collection.append(c)
    
    -        fetch = self.columns[0]
    +        fetch = columns[0]
             if adapter:
                 fetch = adapter.columns[fetch]
             memoized_populators[self.parent_property] = fetch
    diff --git a/lib/sqlalchemy/orm/strategy_options.py b/lib/sqlalchemy/orm/strategy_options.py
    index 796c17e..b075f92 100644
    --- a/lib/sqlalchemy/orm/strategy_options.py
    +++ b/lib/sqlalchemy/orm/strategy_options.py
    @@ -973,7 +973,7 @@ def defer(key, *addl_attrs):
    
    
     @loader_option()
    -def undefer(loadopt, key):
    +def undefer(loadopt, key, expression=None):
         """Indicate that the given column-oriented attribute should be undeferred,
         e.g. specified within the SELECT statement of the entity as a whole.
    
    @@ -1007,16 +1007,18 @@ def undefer(loadopt, key):
             :func:`.orm.undefer_group`
    
         """
    -    return loadopt.set_column_strategy(
    +    loader = loadopt.set_column_strategy(
             (key, ),
    -        {"deferred": False, "instrument": True}
    +        {"deferred": False, "instrument": True},
    +        opts={"expression": expression} if expression is not None else None
         )
    +    return loader
    
    
     @undefer._add_unbound_fn
    -def undefer(key, *addl_attrs):
    +def undefer(key, *addl_attrs, **kw):
         return _UnboundLoad._from_keys(
    -        _UnboundLoad.undefer, (key, ) + addl_attrs, False, {})
    +        _UnboundLoad.undefer, (key, ) + addl_attrs, False, kw)
    
    
     @loader_option()
    

    this allows the "undefer()" option to be used on any column anywhere with an arbitrary expression:

    from sqlalchemy import *
    from sqlalchemy.orm import *
    from sqlalchemy.ext.declarative import declarative_base
    
    Base = declarative_base()
    
    
    class A(Base):
        __tablename__ = 'a'
        id = Column(Integer, primary_key=True)
        x = Column(Integer)
        y = Column(Integer)
    
        # placeholder, yields NULL usually...
        expr = deferred(null())
    
    e = create_engine("sqlite://", echo=True)
    Base.metadata.create_all(e)
    
    s = Session(e)
    
    s.add(A(x=1, y=2))
    s.commit()
    
    # but let's give it x + y
    a1 = s.query(A).options(undefer(A.expr, expression=A.x + A.y)).first()
    print a1.expr
    
    s.expire_all()
    
    # works on *any* column!
    a1 = s.query(A).options(undefer(A.x, expression=literal(17))).first()
    print a1.x
    

    now if this works with all the crazy inheritance cases and such, would need to be tested.

  14. Mike Bayer repo owner

    whats the work involved w/ the above? 1. tests, including in test/orm and somewhere in test/orm/inheritance too likely, 2. documentation for the parameter with a ".. seealso:" to some narrative docs in http://docs.sqlalchemy.org/en/latest/orm/loading_columns.html#deferred-column-loading, 3. a short blurb that would go into the migration document, current one is http://docs.sqlalchemy.org/en/latest/changelog/migration_11.html, also includes seealso's to the main docs

  15. Mike Bayer repo owner
    • changed milestone to 1.2

    please get some votes in for this feature so that I know to keep it in an upcoming milestone.

  16. Nick Whyte

    Wow this works great, and solves my particular use case. I applied the patch locally and tried it out.

    A bit of insight into the problem: https://groups.google.com/forum/#!topic/sqlalchemy/pMsAuXdDbG8

    Simply put, I need to undefer a column which is calculated at request time, since the query that puts results in to the deferred column required the request context.

    My solution ended up being this:

    school_date = datetime.datetime.now(g.school_tz).date()
    at_school_q = at_school_query(school_date).alias()
    
    occurence = current_app.db.session.query(DBOccurence).options(
        sa.orm.undefer(
            'student_occurences.student.current_presence', 
            expression=sa.select(
                [at_school_q.c.status]
            ).where(at_school_q.c.student_id == Student.id)
        ),
        # sa.orm.joinedload('student_occurences').joinedload('student')
    ).filter(
        DBOccurence.id == occurence_id
    ).first_or_404()
    

    A couple of things to note:

    • I had to disable the joinedload to make this work correctly (otherwise the field was never written, when using literal values).
    • As you mentioned above, it doesn't seem work with inheritence cases (a class using the polymorphic mapper).

    Thanks for the all the hard effort on this, looking forward to seeing this released in the future.

  17. Mike Bayer repo owner

    Support per-query expressions in column properties

    Adds an option to undefer() and the corresponding strategy to pull in an ad-hoc, per query expression in for a column based on options.

    Still to be determined is if column_property() itself, or some variant, would be allowed to have a dynamically generated expression at the class level as well; use case here are exotic expressions such as those using CTEs which benefit from being generated on demand.

    This is a WIP with no tests or documentation as of yet.

    Change-Id: Ibe22daeb7959395af36ea01299e29c89c1c8687f Fixes: #3058

    → <<cset a7d95ba956ee>>

  18. Mike Bayer repo owner

    Add ad-hoc mapped expressions

    Added a new feature :func:.orm.with_expression that allows an ad-hoc SQL expression to be added to a specific entity in a query at result time. This is an alternative to the SQL expression being delivered as a separate element in the result tuple.

    Change-Id: Id8c479f7489fb02e09427837c59d1eabb2a6c014 Fixes: #3058

    → <<cset 83c1e03c5c74>>

  19. mike waites

    Hey Mike,

    So firstly sorry for showing up so late to this party. I've finally managed to get round to implementing some of the queries we had that were crying out for this solution. On the face of it, this patch gives me everything I wanted!!!

    I've run into a couple of issues initially that perhaps are me misunderstanding or something so i thought i'd share.

    Here's a pretty basic example...

    class Tag(db.Model):
    
        __tablename__='tag'
        id = Column(String(), primary_key=True)
        name  = Column(String(), nullable=False)
        used_count = deferred_expression()
    
    class ActivityTags(db.Model):
        __tablename__ = 'activity_tags'
        activity_id = Column(db.String, db.ForeignKey('activity.id'))
        tag_id = Column(db.String, db.ForeignKey('tag.id'))
    
    
    class Activity(db.Model):
        __tablename__='activity'
        id = Column(String(), primary_key=True)
        name  = Column(String(), nullable=False)
        tags = relationship(Tag, secondary=ActivityTag.__table__)
    

    Then we have a query like this

    user_used_count_subquery = session.query(func.count(ActivityTags.id)).filter(
        ActivityTags.tag_id == Tag.id,
        perm_for_activity.exists() # omitted for simplicity
    
    stmt = session.query(Tag).filter(
        ...some condition
    ).options(
        with_expression(Tag.used_count, user_used_count_subquery(user).as_scalar())
    ).order_by(Tag.used_count.desc())
    

    So im very happy to say this works as expected. The used_count column can be selected along with all the other Tag fields just be passing the Tag model to query(). F**K YEH!

    Two things have come of my test case however.

    1. Ordering by the deferred_expression column directly (as in my example) doesn't seem to work. I will get ORDER BY NULL DESC as the value. I managed to work around this by doing order_by(stmt.selectable.columns.used_count) however.

    2. An error occurred when attempting to access a property on the obj. I've not dug into this yet but the issue didn't occur using the master branch of SQLAlchemy without these changes so im assuming that accessing the used_count property in some way caused this. Figured it might be useful to you ?

    >           return getter(obj, name, None)
    /usr/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py:242: in __get__
    >           return self.impl.get(instance_state(instance), dict_)
    /usr/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py:593: in get
    >                   value = state._load_expired(state, passive)
    /usr/local/lib/python2.7/site-packages/sqlalchemy/orm/state.py:593: in _load_expired
    >       self.manager.deferred_scalar_loader(self, toload)
    /usr/local/lib/python2.7/site-packages/sqlalchemy/orm/loading.py:811: in load_scalar_attributes
    >               only_load_props=attribute_names)
    /usr/local/lib/python2.7/site-packages/sqlalchemy/orm/loading.py:231: in load_on_ident
    >           return q.one()
    /usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py:2835: in one
    >           ret = self.one_or_none()
    /usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py:2805: in one_or_none
    >       ret = list(self)
    /usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py:2872: in __iter__
    >       context = self._compile_context()
    /usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py:3405: in _compile_context
    >                   "No column-based properties specified for "
    E               InvalidRequestError: No column-based properties specified for refresh operation. Use session.expire() to reload collections and related items.
    

    So far though i think this is a really really great addition. We have been struggling to work around this problem for a long time now so thanks so much for the awesome work!!

  20. Mike Bayer repo owner

    Ordering by the deferred_expression column directly (as in my example) doesn't seem to work. I

    documented: http://docs.sqlalchemy.org/en/latest/orm/mapped_sql_expr.html#mapper-deferred-expression

    The mapped attribute currently cannot be applied to other parts of the query and make use of the ad-hoc expression; that is, this won’t work: (example) The A.expr expression will resolve to NULL in the above WHERE clause.

    An error occurred when attempting to access a property on the obj.

    OK, I can probably guess what that is...

  21. mike waites

    Thanks for getting back to me Mike.

    Sorry i missed that in the docs. I was reading them via the diff rather than the link you provided which was not useful on my part! Please let me know if I can help in any other way to test your theory regarding the second point.

  22. Mike Bayer repo owner

    heads up I'm renaming deferred_expression to query_expression since "deferred" means something specific that this feature is definitely not.

  23. mike waites

    rgr that! 👍

    I'm adding a few more occurrences of with_expression to our code base and we have a significant number of tests so I can give this a good test when you're ready!

  24. Log in to comment