allow passing aliased() to select_entity_from()

Issue #3933 resolved
Eoghan Murray created an issue

In the select_entity_from docs it reads:

"This method is similar to the Query.select_from() method, in that it sets the FROM clause of the query. However, where Query.select_from() only affects what is placed in the FROM, this method also applies the given selectable to replace the FROM which the selected entities would normally select from."

As you can hopefully see from the attached test case,

  1. aliased fails to recognize that it can adapt an autoloaded view to an existing entity, even when adapt_on_names is specified

  2. select_entity_from doesn't throw an exception when it can't replace the main entity but instead adds a FROM. This is undesired from what I understand is intended.

Comments (14)

  1. Mike Bayer repo owner

    hi there -

    thanks for the great test case.

    aliased fails to recognize that it can adapt an autoloaded view to an existing entity, even when adapt_on_names is specified

    aliased() does not have a problem here. Because the aliased() is wrapped inside of a select(), the adaptation on names is not taking effect for the view of columns passed to the Query. the select_entity_from() method will also not accept the aliased() object directly, which is likely because the purpose of this method is to set up an internal column adapter against the given selectable, and in the case of aliased(), the adaptation is already present and this option wouldn't make sense.

    The idiomatic way to use aliased() is to use it directly:

    test_view = aliased(Test, v, adapt_on_names=True)
    
    assert 200 == s.query(test_view.pay).first()[0]
    

    so in the case that you are looking for being able to have the primary Test class as the original subject of the query, but then change the "from" for this existing query to something else (I'm assuming this is because the Query is in some kind of builder context where you don't know to use the aliased() up front?), we'd want set_select_from() to handle this case. Which means we need a new feature, that is, to also support adapt_on_names for the internal adaption created by this method:

    diff --git a/lib/sqlalchemy/orm/query.py b/lib/sqlalchemy/orm/query.py
    index 2fae362..9a95caa 100644
    --- a/lib/sqlalchemy/orm/query.py
    +++ b/lib/sqlalchemy/orm/query.py
    @@ -185,7 +185,7 @@ class Query(object):
                 for m in m2.iterate_to_root():
                     self._polymorphic_adapters[m.local_table] = adapter
    
    -    def _set_select_from(self, obj, set_base_alias):
    +    def _set_select_from(self, obj, set_base_alias, adapt_on_names=False):
             fa = []
             select_from_alias = None
    
    @@ -214,10 +214,13 @@ class Query(object):
    
             if set_base_alias and \
                     len(self._from_obj) == 1 and \
    -                isinstance(select_from_alias, expression.Alias):
    +                (
    +                    adapt_on_names or
    +                    isinstance(select_from_alias, expression.Alias)
    +                ):
                 equivs = self.__all_equivs()
                 self._from_obj_alias = sql_util.ColumnAdapter(
    -                self._from_obj[0], equivs)
    +                self._from_obj[0], equivs, adapt_on_names=adapt_on_names)
    
         def _reset_polymorphic_adapter(self, mapper):
             for m2 in mapper._with_polymorphic_mappers:
    @@ -2417,7 +2420,7 @@ class Query(object):
             self._set_select_from(from_obj, False)
    
         @_generative(_no_clauseelement_condition)
    -    def select_entity_from(self, from_obj):
    +    def select_entity_from(self, from_obj, adapt_on_names=False):
             r"""Set the FROM clause of this :class:`.Query` to a
             core selectable, applying it as a replacement FROM clause
             for corresponding mapped entities.
    @@ -2499,7 +2502,7 @@ class Query(object):
    
             """
    
    -        self._set_select_from([from_obj], True)
    +        self._set_select_from([from_obj], True, adapt_on_names=adapt_on_names)
    
         def __getitem__(self, item):
             if isinstance(item, slice):
    

    Then we can use the view directly:

    assert 200 == s.query(Test.pay).select_entity_from(v, adapt_on_names=True).first()[0]
    

    select_entity_from doesn't throw an exception when it can't replace the main entity but instead adds a FROM. This is undesired from what I understand is intended.

    It's a prevalent condition across querying that when a FROM object does not supply the columns expected for what the query is expecting, you get an extra entity in the FROM clause, because this is how the Core works; the FROM clause always adds table entries for every column being requested. It's a very common user error to do something like this:

    ua = aliased(User)
    q = s.query(ua).filter(User.name == 'ed')
    

    which produces a SELECT that has 'FROM "user" AS user_1, "user"' in the FROM clause, with the cartesian product.

    In the case here, select_entity_from() says that it will adapt the given selectable to the columns being selected from, but it does not assume that every column you are selecting is accommodated by the selectable, for example if you did this:

    stmt = select([User.id, User.name])
    
    q = s.query(User, Address).select_entity_from(stmt).filter(Address.user_id == User.id).filter(Address.email == 'foo')
    

    the select_entity_from() selectable hits the columns from "User" but not from "Address", it instead renders "FROM user, address" and the filter() criteria later on does an implicit join. It wouldn't be appropriate for select_entity_from() to try to guess that the entity we're selecting from needs to have a match for every column in User and Address.

    I know that the case here is simpler, that there's just one entity, but Query doesn't know what you're going to do with the query as it is built up that may change things. The only way to accommodate for this kind of "avoid the FROM x, y" problem would be some new addition of a "banned FROM clause" list of some kind where when the query finally renders, if we see one of the raw tables coming out that we were told the user intends to replace, we'd raise an error of some kind, but again in complicated cases we might not be able to really guess where those columns are supposed to come from. The simple cases are specializations of a much more complicated generalized problem which is why error checking like that is not very easy to do. A more generalized "cartesian product" detection might be better, but even that would have to detect that there are comma-separated FROM elements and then traverse through all the expressions to make sure none of them are in fact trying to relate two of the FROMs together, again it gets complicated. We'd hope that people simply test their code to make sure queries render as expected which is why echoing of SQL is so emphasized.

    Anyway, the first part here is not really a bug even though this API is kind of advanced. There is a potential feature add.

  2. Eoghan Murray reporter

    Thanks for pointing out idiomatic usage of aliased; I came to my method via a roundabout way and missed that.

    I'm assuming this is because the Query is in some kind of builder context where you don't know to use the aliased() up front?

    The main reason I want to modify the FROM is that the table has quite a few custom column types that need to be carried over as overrides when reflecting the view. Also, there isn't just one view, but many variants of the view. However, I've now got to the point where I realize that adaptation by name might not even be possible for my usecase as I've got columns such as [declarative] coords = column_property(func.ST_AsEWKT(text(u'geom'))) which cause a 'ClauseList' object has no attribute 'name' error to be thrown in sql/elements.py

    There is also still the danger that if I got it working with select_entity_from, if the table schema ever got out of sync with any of the reflected views, problem #2 will occur and instead of an Exception I'll get cartesian product performance in the small percentage of the app where these views will be needed.

    the select_entity_from() selectable hits the columns from "User" but not from "Address", it instead renders "FROM user, address" and the filter() criteria later on does an implicit join

    I'd argue that once the user is using select_entity_from they are effectively saying 'I know what I'm doing', and they should have to explicitly add the other table, i.e. select_entity_from(stmt, Address) if that's indeed what they need. From my reading of the docs, it seems to already imply that this will happen: "However, where Query.select_from() only affects what is placed in the FROM, this method also applies the given selectable to replace the FROM which the selected entities would normally select from" (select_from could maintain the current behavior of the implicit join being added).


    So if this is a feature request, I may as well also mention what I really want, which is just to be able to replace the FROM with raw text, i.e.

    from sqlalchemy.sql import text
    s.query(Test.pay).select_entity_from(text('test_view AS test')).first()[0]
    

    with no further changes to the FROM clause allowed as mentioned above. Maybe the method needs to be called set_from instead. I basically want a way to be able to override the normal SqlAlchemy builder and 'drop down to raw SQL', similarly to how I can drop down to raw text in session.query(Test).filter(text('1=0'))

    With much thanks!

  3. Eoghan Murray reporter

    I just realized I can hack it with

    Test.__table__.name = 'test_view'
    assert 200 == s.query(Test.pay).first()[0]
    Test.__table__.name = 'test'
    

    Now to go and make that threadsafe!

  4. Mike Bayer repo owner

    oh unless you're trying to relate individual entities to different text. still, you can use text().columns() here, did you try passing one of those to select_entity_from()? then you don't need match_on_names, though we can add that as a feature as well.

  5. Eoghan Murray reporter

    unless you're trying to relate individual entities to different text

    don't think so!

    that's the "from_statement()" use case ok missed that! (or maybe rejected it too early as there's no way of adding subsequent Query.filters)

    I've managed to get it working with something along the lines of this:

    view_name = None
    if True:
       view_name = 'test_view'
    q = session.query(Test).filter(Test.a_col=='a', Test.b_col=='b').order_by(Test.c_col.like('%xyz%'))
    if view_name:
       q = session.query(Test).from_statement(text(re.sub('%\((.*?)_[0-9]+\)s', r':\1', str(q)).replace('FROM test', 'FROM %s AS test' % (view_name)))).params(a_col='a', b_col='b', c_col='%xyz%')
    results = q.all()
    

    However, I've got some custom types that seem to be getting obliterated by from_statement (underlying column is string/unicode and process_result_value returns an object - after from_statement, the unicode gets returned instead of the object). [edit separate bug: #3935 ]

    So being able to directly replace the FROM instead of the whole query would still be great!

  6. Mike Bayer repo owner

    why not this?

    assert 200 == s.query(Test.pay).select_entity_from(text("select id, pay from test_view").columns(Test.id, Test.pay)).first()[0]
    
  7. Mike Bayer repo owner

    so there's the text() method (maybe some docs would be nice), and for the way you originally wanted to do it, since aliased() is more versatile in general, we can allow for this:

    assert 200 == s.query(Test.pay).select_entity_from(
        aliased(Test, v, adapt_on_names=True)
    ).filter(Test.pay > 10).first()[0]
    

    just by having the adapter already present in the Aliased object be used by the query:

    diff --git a/lib/sqlalchemy/orm/query.py b/lib/sqlalchemy/orm/query.py
    index 2fae362..8e9e0bf 100644
    --- a/lib/sqlalchemy/orm/query.py
    +++ b/lib/sqlalchemy/orm/query.py
    @@ -194,7 +194,7 @@ class Query(object):
                 if hasattr(info, 'mapper') and \
                         (info.is_mapper or info.is_aliased_class):
                     self._select_from_entity = info
    -                if set_base_alias:
    +                if set_base_alias and not info.is_aliased_class:
                         raise sa_exc.ArgumentError(
                             "A selectable (FromClause) instance is "
                             "expected when the base alias is being set.")
    @@ -218,6 +218,10 @@ class Query(object):
                 equivs = self.__all_equivs()
                 self._from_obj_alias = sql_util.ColumnAdapter(
                     self._from_obj[0], equivs)
    +        elif set_base_alias and \
    +                len(self._from_obj) == 1 and \
    +                info.is_aliased_class:
    +            self._from_obj_alias = info._adapter
    
         def _reset_polymorphic_adapter(self, mapper):
             for m2 in mapper._with_polymorphic_mappers:
    

    trying to raise when adapted columns are missing, that's a much broader issue. everything Core / ORM works additively and an adapter is like a filter, adding rules to ensure filters cover what the intent is is kind of a big deal and makes a lot of assumptions about the intent, which is where this gets difficult, because the range of intent is always much wider than I can guess up front (the select_entity_from() method here an example by this bug report!)

  8. Eoghan Murray reporter

    That seems to have got it! Specifically (as there are dozens of columns) I did:

    view_name = None
    if True:
       view_name = 'test_view'
    q = session.query(Test)
    if view_name:
        q = q.select_entity_from(text('SELECT * FROM %s' % (view_name)).columns(*Test.__table__.columns))
    q = q.filter(Test.a_col=='a', Test.b_col=='b').order_by(Test.c_col.like('%xyz%'))
    results = q.all()
    

    I tried select_entity_from(text('SELECT * FROM %s' % (view_name))) very early on but got the argument is not a mapped class, mapper, aliased(), or FromClause instance. error so went down the map-the-view/aliased route. Could .columns be made more discoverable? Could there be a way to apply .columns(*Test.__table__.columns) automatically?

    Thanks so much for your help!

  9. Mike Bayer repo owner

    oh automatic. at best this is columns_from(table), I think for simplicity the way it is now should be enough.

  10. Mike Bayer repo owner

    Allow aliased() to be passed to Query.select_entity_from().

    An :func:.aliased() construct can now be passed to the :meth:.Query.select_entity_from method. Entities will be pulled from the selectable represented by the :func:.aliased construct. This allows special options for :func:.aliased such as :paramref:.aliased.adapt_on_names to be used in conjunction with :meth:.Query.select_entity_from.

    Additionally rewrote the docstring for :meth:.Query.select_entity_from, including starting with explicit use of :func:.aliased as the usual idiomatic pattern. An example using text().columns() is added as well as the use case from 🎫3933 using name matching.

    Change-Id: If7e182965236993064a2a086e3b6d55a4f097ca8 Fixes: #3933

    → <<cset a4cdd0bc00a9>>

  11. Mike Bayer repo owner

    Allow aliased() to be passed to Query.select_entity_from().

    An :func:.aliased() construct can now be passed to the :meth:.Query.select_entity_from method. Entities will be pulled from the selectable represented by the :func:.aliased construct. This allows special options for :func:.aliased such as :paramref:.aliased.adapt_on_names to be used in conjunction with :meth:.Query.select_entity_from.

    Additionally rewrote the docstring for :meth:.Query.select_entity_from, including starting with explicit use of :func:.aliased as the usual idiomatic pattern. An example using text().columns() is added as well as the use case from 🎫3933 using name matching.

    Change-Id: If7e182965236993064a2a086e3b6d55a4f097ca8 Fixes: #3933 (cherry picked from commit b5577b6fb3decda0293399a630e6601e86e08726)

    → <<cset 6bdfeb73f2eb>>

  12. Log in to comment