subqueryload query invokes ahead of parent loader init, can cause conflicts

Issue #2887 resolved
Mike Bayer repo owner created an issue
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Parent(Base):
    __tablename__ = 'parent'

    id = Column(Integer, primary_key=True)
    name = Column(String(20))

    children = relationship('Child1',
                        back_populates='parent',
                        lazy='noload'
                    )

class Child1(Base):
    __tablename__ = 'child1'

    id = Column(Integer, primary_key=True)
    name = Column(String(20))
    parent_id = Column(Integer, ForeignKey('parent.id'))

    parent = relationship('Parent', back_populates='children', lazy='joined')

engine = create_engine('sqlite:///:memory:', echo="debug")
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)

s = Session()
s.add(Parent(name='parent', children=[Child1(name='c1')](Child1(name='c1'))))
s.commit()

parent = s.query(Parent).options([subqueryload('children')](subqueryload('children'))).first()
print parent.children

in the above case, the load is first Parent->subquery(children). When the row hits, loading looks at "children" in order to invoke its loader - when it's a subqueryloader, it produces the query and invokes it, which is Child1->joinedload->Parent->noload(children). The query joinloads onto Parent, populates Parent.children, Parent now goes into the identity map and is an "Existing" load - it's basically done being loaded. Then we go back out to the original load of Parent which sees that Parent is already in the identity map, and is an "existingload" - so the subqueryloader for Parent.children, even though it created a row processor, never gets to use it.

at some point we added caching to the subqueryload row_processor, so that inheriting mappers don't keep re-invoking the subquery - this is #2480, but looking at that, this is not actually the issue - the subq loader is still being invoked ahead of time before that.

the solution would appear, crudely that the row processor needs to not invoke the subq query at all until it's time to populate:

diff --git a/lib/sqlalchemy/orm/strategies.py b/lib/sqlalchemy/orm/strategies.py
index 8226a0e..f053b0c 100644
--- a/lib/sqlalchemy/orm/strategies.py
+++ b/lib/sqlalchemy/orm/strategies.py
@@ -937,12 +937,19 @@ class SubqueryLoader(AbstractRelationshipLoader):
         # call upon create_row_processor again
         collections = path.get(context.attributes, "collections")
         if collections is None:
-            collections = dict(
-                    (k, [vv[0](vv[0) for vv in v])
-                    for k, v in itertools.groupby(
-                        subq,
-                        lambda x: x[1:](1:)
-                    ))
+            _data = dict()
+            class X(object):
+                def get(self, key, default):
+                    if not _data:
+                        _data.update(
+                            (k, [vv[0](vv[0) for vv in v])
+                            for k, v in itertools.groupby(
+                                subq,
+                                lambda x: x[1:](1:)
+                            )
+                        )
+                    return _data.get(key, default)
+            collections = X()
             path.set(context.attributes, 'collections', collections)

         if adapter:

however, this breaks a bunch of tests. adding subq = list(subq) to the above then fixes those tests, so this proves that current behavior is somehow relying upon the subq query being invoked early, at the moment I can't imagine how that is needed.

Comments (5)

  1. Mike Bayer reporter

    at least some of the tests that fail are because they rely upon subqueryload to emit its query and load further related objects, even when the lead object otherwise is not subject to any row processing at all:

            u1 = sess.query(User).filter_by(id=7).options(subqueryload("orders")).one()
    
            # id=7 is already loaded, no row processing for this User will take effect.
            # but we still want orders.items to get populated on the existing orders.
            sess.query(User).filter_by(id=7).options(subqueryload_all("orders.items")).first()
            assert 'items' in u1.orders[0](0).__dict__
    
  2. Mike Bayer reporter

    we can address that issue by adding a "make sure we run the subq loader" function like this:

    @@ -962,7 +969,13 @@ class SubqueryLoader(AbstractRelationshipLoader):
                 state.get_impl(self.key).\
                         set_committed_value(state, dict_, collection)
    
    -        return load_collection_from_subq, None, None
    +        def _make_sure_its_loaded(state, dict_, row):
    +                collections.get(
    +                tuple([row[col](row[col) for col in local_cols]),
    +                ()
    +            )
    +
    +        return load_collection_from_subq, None, None, _make_sure_its_loaded
    
         def _create_scalar_loader(self, collections, local_cols):
             def load_scalar_from_subq(state, dict_, row):
    @@ -980,7 +993,13 @@ class SubqueryLoader(AbstractRelationshipLoader):
                 state.get_impl(self.key).\
                         set_committed_value(state, dict_, scalar)
    
    -        return load_scalar_from_subq, None, None
    +        def _make_sure_its_loaded(state, dict_, row):
    +            collections.get(
    +                tuple([row[col](row[col) for col in local_cols]),
    +                ()
    +            )
    +
    +        return load_scalar_from_subq, None, None, _make_sure_its_loaded
    

    additional issues remain with some self-referential loading

  3. Mike Bayer reporter

    whew, no problem, that was just because we were testing _dict being empty, which it might be anyway, here's an even more crude "with a flag" thing, obviously need to clean this up:

    diff --git a/lib/sqlalchemy/orm/strategies.py b/lib/sqlalchemy/orm/strategies.py
    index 8226a0e..2f20040 100644
    --- a/lib/sqlalchemy/orm/strategies.py
    +++ b/lib/sqlalchemy/orm/strategies.py
    @@ -937,12 +937,22 @@ class SubqueryLoader(AbstractRelationshipLoader):
             # call upon create_row_processor again
             collections = path.get(context.attributes, "collections")
             if collections is None:
    -            collections = dict(
    -                    (k, [vv[0](vv[0) for vv in v])
    -                    for k, v in itertools.groupby(
    -                        subq,
    -                        lambda x: x[1:](1:)
    -                    ))
    +            flag = [0](0)
    +            _data = dict()
    +            #subq = list(subq)
    +            class X(object):
    +                def get(self, key, default):
    +                    if not flag[0](0):
    +                        flag[0](0) = 1
    +                        _data.update(
    +                            (k, [vv[0](vv[0) for vv in v])
    +                            for k, v in itertools.groupby(
    +                                subq,
    +                                lambda x: x[1:](1:)
    +                            )
    +                        )
    +                    return _data.get(key, default)
    +            collections = X()
                 path.set(context.attributes, 'collections', collections)
    
             if adapter:
    
  4. Log in to comment