evaulate lazy loader's "reset" of a given collection; usually not necessary, plus can step on an eager loader

Issue #3145 resolved
Dobes Vandermeer created an issue

Under certain circumstances the above error can occur. It's a bit hard to explain, but I will try.

In our case:

  1. An entity joins to itself in a parent/child relation
  2. An instance has its parent relation set to itself
  3. A collection in the parent entity is eagerly loaded as part of a query
  4. The properties of the entity are ordered such that the collection is loaded after the parent

This triggers a sequence of events in which:

  1. The child is loaded
  2. The parent is loaded as part of loading properties, the parent is set as the same object as the child because they are the same entity (same id)
  3. The parent initializes the collection for eager loading, this updates different things in a few places
  4. The child continues setting up its properties, setting the same collection up for lazy loading, but leaving an indirect weak reference to the collection in context.attributes that the eager loader set up
  5. The collection set up in the parent to collect the eagerly loaded entities is garbage collected because it only has a weak reference to it at this point
  6. When a member of the collection is loaded, there is a crash because the underlying collection was garbage collected

The stack trace is as follows:

Traceback (most recent call last):
  File "/home/dobes/sqlalchemy/test/orm/test_temp.py", line 64, in test_bug
    session.query(A).join(A.b).join(b_parent, b_parent.b_id == B.parent_id).join(b_parent.z).filter(BC.value>0).options(joinedload('b').joinedload('parent').joinedload('z')).all()
  File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/query.py", line 2293, in all
    return list(self)
  File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/loading.py", line 72, in instances
    rows = [process[0](row, None) for row in fetch]
  File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/loading.py", line 452, in _instance
    populate_state(state, dict_, row, isnew, only_load_props)
  File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/loading.py", line 305, in populate_state
    populator(state, dict_, row)
  File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/strategies.py", line 1419, in load_scalar_from_joined_existing_row
    existing = _instance(row, None)
  File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/loading.py", line 481, in _instance
    populate_state(state, dict_, row, isnew, attrs)
  File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/loading.py", line 309, in populate_state
    populator(state, dict_, row)
  File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/strategies.py", line 1419, in load_scalar_from_joined_existing_row
    existing = _instance(row, None)
  File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/loading.py", line 481, in _instance
    populate_state(state, dict_, row, isnew, attrs)
  File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/loading.py", line 309, in populate_state
    populator(state, dict_, row)
  File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/strategies.py", line 1401, in load_collection_from_joined_existing_row
    _instance(row, result_list)
  File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/loading.py", line 500, in _instance
    result.append(instance)
  File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/util/_collections.py", line 753, in append
    self._data_appender(item)
  File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/collections.py", line 655, in append_without_event
    self._data()._sa_appender(item, _sa_initiator=False)
AttributeError: 'NoneType' object has no attribute '_sa_appender'

I've attached a test case that reproduces the issue. Note that because the order of properties is significant, I did a bit of a hack to sort the properties.

Workaround:

What seems to be working for me is to specify eager loading for both collections - the one on the parent and the one on the child. For example to modify the query options in the attached test case like:

        res = (session.query(A)
               .join(A.b)
               .join(b_parent, b_parent.b_id == B.parent_id)
               .join(b_parent.z).filter(BC.value>0)
               .options(joinedload('b').joinedload('z'))
               .options(joinedload('b').joinedload('parent').joinedload('z')).all()
        )

will not crash, it seems.

Comments (14)

  1. Mike Bayer repo owner

    oh. what's this?

    b.parent_id = b.b_id
    

    yeah, that is going to create problems. we have no test coverage for this case and there's probably a lot of issues with it.

  2. Mike Bayer repo owner

    here's the minimal test case

    from sqlalchemy import Column, Integer, ForeignKey, create_engine
    from sqlalchemy.ext.declarative import declarative_base
    from sqlalchemy.orm import relation, joinedload, backref, \
        Session, configure_mappers
    
    Base = declarative_base()
    
    
    class A(Base):
        __tablename__ = 'A'
    
        a_id = Column(Integer, primary_key=True)
        b_id = Column(Integer, ForeignKey('B.b_id'))
        b = relation('B')
    
    
    class B(Base):
        __tablename__ = 'B'
    
        b_id = Column(Integer, primary_key=True)
        parent_id = Column(Integer, ForeignKey('B.b_id'))
        z = relation('BC')
    
        parent = relation("B", remote_side='B.b_id')
    
    
    class BC(Base):
        __tablename__ = 'BC'
        bc_id = Column(Integer, primary_key=True)
        b_id = Column(Integer, ForeignKey('B.b_id'))
    
    bm = B.__mapper__
    bmp = bm._props
    configure_mappers()
    # Bug is order-dependent, must sort the "z" property to the end
    bmp.sort()
    
    e = create_engine('sqlite://', echo=True)
    
    Base.metadata.create_all(e)
    session = Session(e)
    b = B()
    session.add(b)
    session.flush()
    
    b.parent_id = b.b_id
    
    b.z.append(BC())
    b.z.append(BC())
    session.commit()
    
    # If the bug is here, the next line throws an exception
    session.query(B).options(joinedload('parent').joinedload('z')).all()
    
  3. Dobes Vandermeer reporter

    @zzzeek

    Yes, indeed - the bug is triggered by a self-referential object being loaded, so the same object is being eagerly loaded twice, and with different eager loading options on its own properties.

    Our use case is that every node in a certain directed graph has a reference to the root node - including the root node itself. We want to eagerly load some information off the root node regardless of where we are in the tree. Thus, I thought I could just set the "root" reference of the root node to itself and eagerly load that information through that reference, even though the node we're eagerly loading through might in fact be the root node. Often it is not.

    The workaround is probably OK for us, but this issue may be useful documentation for that workaround for those who happen to get the same error and go searching for answers.

    Or possibly the bug can be fixed, I'm not sure what would be the right fix, though, it's a bit of a complex piece of logic there.

  4. Mike Bayer repo owner

    yeah so to jot down what this is for me, we load B, then go to load B.parent, load B.parent, then go to load B.parent.zs, then put a BC in there, then go to load B.zs and oh, that's not in eager loading so lets zap B.zs, which breaks B.parent.zs, there's the conflict.

  5. Mike Bayer repo owner

    Here are thoughts on how this might be addressed.

    diff --git a/lib/sqlalchemy/orm/collections.py b/lib/sqlalchemy/orm/collections.py
    index 698677a..46367b0 100644
    --- a/lib/sqlalchemy/orm/collections.py
    +++ b/lib/sqlalchemy/orm/collections.py
    @@ -575,6 +575,13 @@ class CollectionAdapter(object):
             self._key = attr.key
             self._data = weakref.ref(data)
             self.owner_state = owner_state
    +
    +        # this would be an easy way to link a collection to the
    +        # operation in which it is loading.  but this wont
    +        # be meaningful within a populate_existing, so this may
    +        # be useless.
    +        # self.runid = owner_state.runid
    +
             self.link_to_self(data)
    
         def _warn_invalidated(self):
    diff --git a/lib/sqlalchemy/orm/state.py b/lib/sqlalchemy/orm/state.py
    index a9024b4..1d24297 100644
    --- a/lib/sqlalchemy/orm/state.py
    +++ b/lib/sqlalchemy/orm/state.py
    @@ -326,12 +326,24 @@ class InstanceState(interfaces._InspectionAttr):
    
             self.manager.get_impl(key).initialize(self, self.dict)
    
    -    def _reset(self, dict_, key):
    +    def _reset(self, dict_, key): #, check_runid=False):
             """Remove the given attribute and any
                callables associated with it."""
    
             old = dict_.pop(key, None)
             if old is not None and self.manager[key].impl.collection:
    +            # it is highly possible that this codepath only normally
    +            # occurs in a populate_existing scenario.  If the collection
    +            # is present here otherwise, then this object were already
    +            # loaded and loading.py is only going to hit that which is
    +            # "unloaded".
    +
    +            # here's one way to just see if the collection there
    +            # is the one we're loading right now.
    +            #if check_runid and old._sa_adapter.runid == self.runid:
    +            #    dict_[key] = old
    +            #    return
    +
                 self.manager[key].impl._invalidate_collection(old)
             self.callables.pop(key, None)
    
    @@ -356,6 +368,9 @@ class InstanceState(interfaces._InspectionAttr):
                 def _set_callable(state, dict_, row):
                     old = dict_.pop(key, None)
                     if old is not None:
    +                    # is this a populate_existing only path as well?
    +                    # this occurs for the non-class-level lazy
    +                    # loader.
                         impl._invalidate_collection(old)
                     state.callables[key] = fn
             else:
    diff --git a/lib/sqlalchemy/orm/strategies.py b/lib/sqlalchemy/orm/strategies.py
    index 1e8020d..29b760a 100644
    --- a/lib/sqlalchemy/orm/strategies.py
    +++ b/lib/sqlalchemy/orm/strategies.py
    @@ -620,6 +620,7 @@ class LazyLoader(AbstractRelationshipLoader):
                 self, context, path, loadopt,
                 mapper, row, adapter):
             key = self.key
    +
             if not self.is_class_level:
                 # we are not the primary manager for this attribute
                 # on this class - set up a
    @@ -635,6 +636,15 @@ class LazyLoader(AbstractRelationshipLoader):
    
                 return set_lazy_callable, None, None
             else:
    +            # this might be the best approach, as populate_existing
    +            # / always_refresh
    +            # are really super rare, just don't bother here if we aren't
    +            # in that kind of operation.
    +            populate_existing = context.populate_existing or \
    +                mapper.always_refresh
    +            if not populate_existing:
    +                return None, None, None
    +
                 def reset_for_lazy_callable(state, dict_, row):
                     # we are the primary manager for this attribute on
                     # this class - reset its
    @@ -644,7 +654,10 @@ class LazyLoader(AbstractRelationshipLoader):
                     # this is needed in
                     # populate_existing() types of scenarios to reset
                     # any existing state.
    -                state._reset(dict_, key)
    +
    +                # this is probably a very wasteful call when populate
    +                # existing is not set!!
    +                state._reset(dict_, key)  # , check_runid=True)
    
                 return reset_for_lazy_callable, None, None
    
  6. Mike Bayer repo owner
    • Made a small adjustment to the mechanics of lazy loading, such that it has less chance of interfering with a joinload() in the very rare circumstance that an object points to itself; in this scenario, the object refers to itself while loading its attributes which can cause a mixup between loaders. The use case of "object points to itself" is not fully supported, but the fix also removes some overhead so for now is part of testing. fixes #3145

    → <<cset fa5522547150>>

  7. Юрий Пайков

    It seems to me that a variation of this case still rises an error . In my case I have the following Architecture:

              A <--------(child-parent)- A <---(A.bs/B.as)--------> B -----(B.cs)-----> C
               |______________________(A.bs)_______________________^
    

    So the query is likes this:

     AWithChildren=aliased(A, name='second')
     query = session.query(
                B
            ).filter_by(
                id=B_id_variable
            ).join(
                A
            ).outerjoin(
                AWithChildren,
                A.self_referential_realtionship,
            ).join(
                AWithChildren.bs, aliased=True
            ).options(
                contains_eager(B.as).
                contains_eager(A.self_referential_relationship).
                joinedload(A.bs).
                joinedload(B.cs)
            )
    

    My initial goal was to eliminate excessive queries on C when I manipulate query results and - given some b - try to access b.As.self_referential_relationship.bs.cs In this way, B references itself through A and I anyway get 'NoneType' object has no attribute '_sa_appender'

  8. Log in to comment