subqueryload query invokes ahead of parent loader init, can cause conflicts

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Parent(Base):
    __tablename__ = 'parent'

    id = Column(Integer, primary_key=True)
    name = Column(String(20))

    children = relationship('Child1',
                        back_populates='parent',
                        lazy='noload'
                    )

class Child1(Base):
    __tablename__ = 'child1'

    id = Column(Integer, primary_key=True)
    name = Column(String(20))
    parent_id = Column(Integer, ForeignKey('parent.id'))

    parent = relationship('Parent', back_populates='children', lazy='joined')

engine = create_engine('sqlite:///:memory:', echo="debug")
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)

s = Session()
s.add(Parent(name='parent', children=[Child1(name='c1')](Child1(name='c1'))))
s.commit()

parent = s.query(Parent).options([subqueryload('children')](subqueryload('children'))).first()
print parent.children

in the above case, the load is first Parent->subquery(children). When the row hits, loading looks at "children" in order to invoke its loader - when it's a subqueryloader, it produces the query and invokes it, which is Child1->joinedload->Parent->noload(children). The query joinloads onto Parent, populates Parent.children, Parent now goes into the identity map and is an "Existing" load - it's basically done being loaded. Then we go back out to the original load of Parent which sees that Parent is already in the identity map, and is an "existingload" - so the subqueryloader for Parent.children, even though it created a row processor, never gets to use it.

at some point we added caching to the subqueryload row_processor, so that inheriting mappers don't keep re-invoking the subquery - this is ~~#2480~~, but looking at that, this is not actually the issue - the subq loader is still being invoked ahead of time before that.

the solution would appear, crudely that the row processor needs to not invoke the subq query at all until it's time to populate:

diff --git a/lib/sqlalchemy/orm/strategies.py b/lib/sqlalchemy/orm/strategies.py
index 8226a0e..f053b0c 100644
--- a/lib/sqlalchemy/orm/strategies.py
+++ b/lib/sqlalchemy/orm/strategies.py
@@ -937,12 +937,19 @@ class SubqueryLoader(AbstractRelationshipLoader):
         # call upon create_row_processor again
         collections = path.get(context.attributes, "collections")
         if collections is None:
-            collections = dict(
-                    (k, [vv[0](vv[0) for vv in v])
-                    for k, v in itertools.groupby(
-                        subq,
-                        lambda x: x[1:](1:)
-                    ))
+            _data = dict()
+            class X(object):
+                def get(self, key, default):
+                    if not _data:
+                        _data.update(
+                            (k, [vv[0](vv[0) for vv in v])
+                            for k, v in itertools.groupby(
+                                subq,
+                                lambda x: x[1:](1:)
+                            )
+                        )
+                    return _data.get(key, default)
+            collections = X()
             path.set(context.attributes, 'collections', collections)

         if adapter:

however, this breaks a bunch of tests. adding subq = list(subq) to the above then fixes those tests, so this proves that current behavior is somehow relying upon the subq query being invoked early, at the moment I can't imagine how that is needed.

Comments (5)