from_self() with contains_eager, confirm usage pattern and add documentation notes

Issue #3544 resolved
Lukas Siemon created an issue

The test case below generates the following query:

SELECT
   label_alias.id,
   label_alias.name,
   anon_1.venue_id,
   anon_1.venue_name   
FROM
   label AS label_alias,  -- this should come from the inner query
   (SELECT
      venue.id AS venue_id,
      venue.name AS venue_name   
   FROM
      venue 
   JOIN
      venue_to_label AS venue_to_label_1 
         ON venue.id = venue_to_label_1.venue_id 
   JOIN
      label AS label_alias 
         ON label_alias.id = venue_to_label_1.label_id   
   WHERE
      label_alias.name IN (
         %(
            name_1
         )s
      )) AS anon_1

However, based on the alchemy query, I'd expect the label_alias to come from the inner sql query instead of from the outer. Is there an easy way to resolve this?

To give some background: We're using this strategy generically with joinedload instead of contains_eager to serialize all our queries to json. It works really well, however now the requirement came up to only return "certain joins from the joinedload".

I've tested the issue with SQLAlchemy==1.0.8 Flask==0.10.1 Flask-SQLAlchemy==2.0

Test case:

import unittest
from sqlalchemy import (
    Table, Column, Integer, ForeignKey, String, func)
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import (relationship, aliased, contains_eager)
from flask import Flask
from flask.ext.sqlalchemy import SQLAlchemy


# -- create all the database models
app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = ("sqlite://")
db = SQLAlchemy(app)

Base = declarative_base()

venue_to_label = Table(
    'venue_to_label', db.metadata,
    Column('venue_id', Integer, ForeignKey('venue.id'), primary_key=True),
    Column('label_id', Integer, ForeignKey('label.id'), primary_key=True)
)


class Label(db.Model):
    __tablename__ = 'label'
    id = Column(Integer, primary_key=True, nullable=False)
    name = Column(String(254))


class Venue(db.Model):
    __tablename__ = 'venue'
    id = Column(Integer, primary_key=True, nullable=False)
    name = Column(String(254))
    labels = relationship(Label, secondary=venue_to_label)

db.drop_all()
db.create_all()


class TestContainsEager(unittest.TestCase):

    def test_contains_eager(self):
        query = Venue.query
        # filtered join
        label_alias = aliased(Label, name="label_alias")
        query = query.join(label_alias, Venue.labels)
        query = query.filter(label_alias.name.in_(["label1"]))

        # together with a windowing function this allows to correctly apply
        # limit and order to the query, needed since labels is a *-to-many join
        query = query.from_self()

        # define to load from the inner query
        query = query.options(
            contains_eager(
                'labels', alias=label_alias
            ).load_only('id', 'name')
        )

        # contains_eager does not pick up the alias from the inner query
        import sqlalchemy.dialects.postgresql as postgresql
        print query.statement.compile(dialect=postgresql.dialect())

Comments (16)

  1. Mike Bayer repo owner

    so what's happening here is that once you do a "from_self", the subquery that's going to be wrapped is now "baked"; its structure won't change. But in this case, contains_eager() has the desire to add additional columns to select from, to the top of the SELECT, in order to populate the collection. When it does so, the inner subquery can't provide what it needs, because it doesn't supply the columns from Label needed on the outside.

    Now, this is very easy to fix, however note that I had no idea what to do with this for about ten minutes, so this is not at all intuitive until you see it, then it seems easy, but as far as users knowing to do this, I'd not expect anyone to come up with this without some serious thinking. So there is definitely a place here that we might want to add a documentation note to contains_eager(). Because if I don't even think it's possible at first, well, I wrote the thing so there you go :).

    Anyway - we just add those columns to the inner query so that they are available:

    query = query.add_columns(label_alias.id, label_alias.name)
    query = query.from_self()
    
    
    query = query.options(
        contains_eager(
            'labels', alias=label_alias
        ).load_only('id', 'name')
    )
    

    then we get the right statement:

    SELECT anon_1.label_alias_id, anon_1.label_alias_name, anon_1.venue_id, anon_1.venue_name 
    FROM (SELECT venue.id AS venue_id, venue.name AS venue_name, label_alias.id AS label_alias_id, label_alias.name AS label_alias_name 
    FROM venue JOIN venue_to_label AS venue_to_label_1 ON venue.id = venue_to_label_1.venue_id JOIN label AS label_alias ON label_alias.id = venue_to_label_1.label_id 
    WHERE label_alias.name IN (%(name_1)s)) AS anon_1
    

    if you can confirm that this actually works with data and returns the right results and all that, that would be great.

  2. Lukas Siemon reporter

    Ah, I tried something like that before but used joinedload instead of add_columns. This now works sort of (we're making progress!).

    Unfortunately now when I run "query.all()" I get a separate venue per label. So a venue with two labels would be returned as two venues. Is there a way to force the merge?

    Edit: I can provide a more elaborate test case if that helps. Please let me know.

    Edit: Just experimenting with manual de-duplication, will report here once I know more.

  3. Lukas Siemon reporter

    So something like:

    result = {r[0].id: r[0] for r in query.all()}.values()
    

    seems to do the job! Thank you for the help! Note that his doesn't preserve the order.

  4. Mike Bayer repo owner

    we can do better. Call from_self() like this:

    query = query.from_self(Venue)
    

    then you get a query that's just against Venue again, not those columns.

  5. Lukas Siemon reporter

    I'm now trying to in-cooperate this into the monolith class we're using. Might be back if I find more issues. Thank you again for your quick help!

  6. Lukas Siemon reporter

    So I gave up for now. Was running into trouble with a deferred column_property that has a bindparam and correlated_except in it. Might come back to this later.

    Here is my question: Other than using contains_eager is there a way to filter and order by "joinedload" table columns? Basically a "WHERE" and "ORDER BY" after the joinedload options have been applied. I'm willing to monkey patch.

  7. Mike Bayer repo owner

    filtering by "joinedload" is a little weird, at least if you're dealing with a related collection, because the collection would only contain what you filtered on, not everything that's in that collection. For a many-to-one relationship it makes more sense.

    anyway, the equation only works one way - either you want to eager load out of the SQL you've already written, or you want eager loading to be transparently bolted on to the query.

    the only resort after that is to load the two types of objects separately and construct the relationships manually afterward; you'd want to use set_committed_value() for this, see the example at https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/DisjointEagerLoading

  8. Lukas Siemon reporter

    Let me try to clarify my question with an example. The following sql statement is generated by the test case at the bottom of this post:

    SELECT
       anon_1.venue_id,
       anon_1.venue_name,
       label_1.id,
       label_1.name   
    FROM
       (SELECT
          venue.id AS venue_id,
          venue.name AS venue_name   
       FROM
          venue 
       JOIN
          venue_to_label AS venue_to_label_1 
             ON venue.id = venue_to_label_1.venue_id 
       JOIN
          label 
             ON label.id = venue_to_label_1.label_id   
       WHERE
          label.name IN (
             %(
                name_1
             )s
          )) AS anon_1 
    LEFT OUTER JOIN
       (
          venue_to_label AS venue_to_label_2 
       JOIN
          label AS label_1 
             ON label_1.id = venue_to_label_2.label_id
          ) 
             ON anon_1.venue_id = venue_to_label_2.venue_id
    

    My question now is, is there a way to append "WHERE" and "ORDER BY" to that query? Basically if I could append

    WHERE
          label_1.name IN (...)
    

    to the query, that would solve my issue.

    Here is the test case:

    import unittest
    from sqlalchemy import (
        Table, Column, Integer, ForeignKey, String)
    from sqlalchemy.ext.declarative import declarative_base
    from sqlalchemy.orm import (
        relationship, aliased, contains_eager, joinedload)
    from flask import Flask
    from flask.ext.sqlalchemy import SQLAlchemy
    
    
    # -- create all the database models
    app = Flask(__name__)
    app.config['SQLALCHEMY_DATABASE_URI'] = ("sqlite://")
    db = SQLAlchemy(app)
    
    Base = declarative_base()
    
    venue_to_label = Table(
        'venue_to_label', db.metadata,
        Column('venue_id', Integer, ForeignKey('venue.id'), primary_key=True),
        Column('label_id', Integer, ForeignKey('label.id'), primary_key=True)
    )
    
    
    class Label(db.Model):
        __tablename__ = 'label'
        id = Column(Integer, primary_key=True, nullable=False)
        name = Column(String(254))
    
    
    class Venue(db.Model):
        __tablename__ = 'venue'
        id = Column(Integer, primary_key=True, nullable=False)
        name = Column(String(254))
        labels = relationship(Label, secondary="venue_to_label")
    
    db.drop_all()
    db.create_all()
    
    
    class TestContainsEager(unittest.TestCase):
    
        def test_joined_load(self):
            query = Venue.query
            query = query.join(Label, Venue.labels)
            query = query.filter(Label.name.in_(["label1"]))
    
            query = query.from_self()
    
            # define to load from the inner query
            query = query.options(
                joinedload('labels').load_only('id', 'name')
            )
    
            import sqlalchemy.dialects.postgresql as postgresql
            print query.statement.compile(dialect=postgresql.dialect())
    
  9. Mike Bayer repo owner

    if you truly want the two JOINs, one on the inside and another on the outside, after you do your from_self() you'd emit outerjoin() again on the Query, and then to make the new columns become part of the collection you'd use contains_eager.

  10. Lukas Siemon reporter

    Solving this will require more time than I can currently spend. I'll come back once I have implemented this or run into more trouble :) Thanks you very much for the help!

  11. Mike Bayer repo owner
    • write a real inline documentation section for from_self(), building on the example of query.join(). Ideally all the methods in Query would have this kind of thing. fixes #3544

    → <<cset 8fa62d652b8d>>

  12. Log in to comment