subqueryload is not executed when a member of the same collection is loaded via joinedload

Issue #3854 resolved
Kent Rakip created an issue

When a joinedload path causes any entity in a one-to-many relationship to be eagerly loaded, any subqueryload path for the same relationship will not be emitted, even if it might return additional results.

Here is a minimal repro case:

from sqlalchemy import Column, ForeignKey, Integer, create_engine, inspect
from sqlalchemy.orm import relationship, sessionmaker, joinedload
from sqlalchemy.ext.declarative import declarative_base


Base = declarative_base()


class Thing(Base):
    __tablename__ = 'Thing'

    id = Column(Integer, primary_key=True, autoincrement=True)
    another_thing_id = Column(ForeignKey('AnotherThing.id'))
    another_thing = relationship('AnotherThing')

    yet_another_thing_id = Column(ForeignKey('YetAnotherThing.id'))
    yet_another_thing = relationship('YetAnotherThing')


class AnotherThing(Base):
    __tablename__ = 'AnotherThing'

    id = Column(Integer, primary_key=True, autoincrement=True)


class YetAnotherThing(Base):
    __tablename__ = 'YetAnotherThing'

    id = Column(Integer, primary_key=True, autoincrement=True)

    another_thing_id = Column(ForeignKey('AnotherThing.id'))
    another_thing = relationship('AnotherThing', backref='yet_another_things')
    last_thing_id = Column(ForeignKey('LastThing.id'))
    last_thing = relationship('LastThing')


class LastThing(Base):
    __tablename__ = 'LastThing'

    id = Column(Integer, primary_key=True, autoincrement=True)


engine = create_engine('sqlite:///', echo=True)
Base.metadata.create_all(engine)
session_factory = sessionmaker(bind=engine)

session = session_factory()

thing = Thing()
thing.another_thing = AnotherThing()
thing.another_thing.yet_another_things = [YetAnotherThing(), YetAnotherThing()]
thing.yet_another_thing = thing.another_thing.yet_another_things[0]
thing.another_thing.yet_another_things[0].last_thing = LastThing()
thing.another_thing.yet_another_things[1].last_thing = LastThing()

session.add(thing)
session.commit()

session2 = session_factory()

t = (
    session2.query(Thing)
        .options(
            joinedload(Thing.another_thing)
                .subqueryload(AnotherThing.yet_another_things)
                .joinedload(YetAnotherThing.last_thing),
            joinedload(Thing.yet_another_thing)
                .joinedload(YetAnotherThing.another_thing),
        )
        .get(1)
)

# Should print 'False'; actually prints 'True' on 1.1.4
print('yet_another_things' in inspect(t.another_thing).unloaded)

The query that should load all of the values for AnotherThing.yet_another_things is never emitted, and as a result last_thing is not loaded for any of the YetAnotherThings either, since that occurs later on the same path. Commenting out the second joinedload path causes the subqueryload query to be emitted correctly.

This seems to be a regression in 1.1; I'm unable to reproduce it on 1.0.16, although it is possible that the behavior is nondeterministic and I never got lucky.

Comments (6)

  1. Mike Bayer repo owner

    After much experimentation and observation of a totally unrelated change seeming to be the thing that changes this, dictionary/set ordering becomes very suspect as the difference in behavior.

    I can reproduce this in 1.0.16 if I run python with -R and reverse the order of the relationships:

    class Thing(Base):
        __tablename__ = 'Thing'
    
        id = Column(Integer, primary_key=True, autoincrement=True)
    
        another_thing_id = Column(ForeignKey('AnotherThing.id'))
        another_thing = relationship('AnotherThing')
    
        yet_another_thing_id = Column(ForeignKey('YetAnotherThing.id'))
        yet_another_thing = relationship('YetAnotherThing')
    

    vs.

    class Thing(Base):
        __tablename__ = 'Thing'
    
        id = Column(Integer, primary_key=True, autoincrement=True)
    
        yet_another_thing_id = Column(ForeignKey('YetAnotherThing.id'))
        yet_another_thing = relationship('YetAnotherThing')
    
        another_thing_id = Column(ForeignKey('AnotherThing.id'))
        another_thing = relationship('AnotherThing')
    

    the nature of this kind of issue is one where there are overlapping paths that both account for the same object. Depending on the order in which the attributes are serviced determines which loader path applies to the child object:

        # Thing -> AnotherThing -> YetAnotherThing -> LastThing
    

    vs

        # Thing -> YetAnotherThing -> AnotherThing
    

    these paths conflict and only one of them takes effect for that "AnotherThing" object. There have been several fixes recently to improve this kind of thing such as #3822, #3431, #3811. I've yet to dig into this one to see if the decisionmaking can be improved but I'm observing this is not a new behavior.

  2. Mike Bayer repo owner

    OK I think we can do this, there's logic in joinedloader that isn't present here, basic idea is like:

    diff --git a/lib/sqlalchemy/orm/loading.py b/lib/sqlalchemy/orm/loading.py
    index f749cdd..e5d6ca7 100644
    --- a/lib/sqlalchemy/orm/loading.py
    +++ b/lib/sqlalchemy/orm/loading.py
    @@ -364,7 +364,6 @@ def _instance_processor(
             is_not_primary_key = _none_set.intersection
    
         def _instance(row):
    -
             # determine the state that we'll be populating
             if refresh_identity_key:
                 # fixed state that we're refreshing
    diff --git a/lib/sqlalchemy/orm/strategies.py b/lib/sqlalchemy/orm/strategies.py
    index ace8287..f991ae1 100644
    --- a/lib/sqlalchemy/orm/strategies.py
    +++ b/lib/sqlalchemy/orm/strategies.py
    @@ -1079,7 +1079,7 @@ class SubqueryLoader(AbstractRelationshipLoader):
    
         def _create_collection_loader(
                 self, context, collections, local_cols, populators):
    -        def load_collection_from_subq(state, dict_, row):
    +        def load_collection_from_subq_newrow(state, dict_, row):
                 collection = collections.get(
                     tuple([row[col] for col in local_cols]),
                     ()
    @@ -1087,13 +1087,21 @@ class SubqueryLoader(AbstractRelationshipLoader):
                 state.get_impl(self.key).\
                     set_committed_value(state, dict_, collection)
    
    -        populators["new"].append((self.key, load_collection_from_subq))
    +        def load_collection_from_subq_existingrow(state, dict_, row):
    +            if self.key not in dict_:
    +                load_collection_from_subq_newrow(state, dict_, row)
    +
    +        populators["new"].append(
    +            (self.key, load_collection_from_subq_newrow))
    +        populators["existing"].append(
    +            (self.key, load_collection_from_subq_existingrow))
    +
             if context.invoke_all_eagers:
                 populators["eager"].append((self.key, collections.loader))
    
         def _create_scalar_loader(
                 self, context, collections, local_cols, populators):
    -        def load_scalar_from_subq(state, dict_, row):
    +        def load_scalar_from_subq_newrow(state, dict_, row):
                 collection = collections.get(
                     tuple([row[col] for col in local_cols]),
                     (None,)
    @@ -1108,7 +1116,14 @@ class SubqueryLoader(AbstractRelationshipLoader):
                 state.get_impl(self.key).\
                     set_committed_value(state, dict_, scalar)
    
    -        populators["new"].append((self.key, load_scalar_from_subq))
    +        def load_scalar_from_subq_existingrow(state, dict_, row):
    +            if self.key not in dict_:
    +                load_scalar_from_subq_newrow(state, dict_, row)
    +
    +        populators["new"].append(
    +            (self.key, load_scalar_from_subq_newrow))
    +        populators["existing"].append(
    +            (self.key, load_scalar_from_subq_existingrow))
             if context.invoke_all_eagers:
                 populators["eager"].append((self.key, collections.loader))
    

    I'll see if this doesn't do anything that might incur unwanted overhead and hopefully can put this in 1.1.x

  3. Mike Bayer repo owner

    Add "existing" populators for subqueryload

    Fixed bug in subquery loading where an object encountered as an "existing" row, e.g. already loaded from a different path in the same query, would not invoke subquery loaders for unloaded attributes that specified this loading. This issue is in the same area as that of 🎫3431, 🎫3811 which involved similar issues with joined loading.

    Change-Id: If111a76b0812010905b0ac811276a816779d297f Fixes: #3854

    → <<cset 3aefc6e8a4a0>>

  4. Log in to comment