- edited description
evaulate lazy loader's "reset" of a given collection; usually not necessary, plus can step on an eager loader
Under certain circumstances the above error can occur. It's a bit hard to explain, but I will try.
In our case:
- An entity joins to itself in a parent/child relation
- An instance has its parent relation set to itself
- A collection in the parent entity is eagerly loaded as part of a query
- The properties of the entity are ordered such that the collection is loaded after the parent
This triggers a sequence of events in which:
- The child is loaded
- The parent is loaded as part of loading properties, the parent is set as the same object as the child because they are the same entity (same id)
- The parent initializes the collection for eager loading, this updates different things in a few places
- The child continues setting up its properties, setting the same collection up for lazy loading, but leaving an indirect weak reference to the collection in context.attributes that the eager loader set up
- The collection set up in the parent to collect the eagerly loaded entities is garbage collected because it only has a weak reference to it at this point
- When a member of the collection is loaded, there is a crash because the underlying collection was garbage collected
The stack trace is as follows:
Traceback (most recent call last):
File "/home/dobes/sqlalchemy/test/orm/test_temp.py", line 64, in test_bug
session.query(A).join(A.b).join(b_parent, b_parent.b_id == B.parent_id).join(b_parent.z).filter(BC.value>0).options(joinedload('b').joinedload('parent').joinedload('z')).all()
File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/query.py", line 2293, in all
return list(self)
File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/loading.py", line 72, in instances
rows = [process[0](row, None) for row in fetch]
File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/loading.py", line 452, in _instance
populate_state(state, dict_, row, isnew, only_load_props)
File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/loading.py", line 305, in populate_state
populator(state, dict_, row)
File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/strategies.py", line 1419, in load_scalar_from_joined_existing_row
existing = _instance(row, None)
File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/loading.py", line 481, in _instance
populate_state(state, dict_, row, isnew, attrs)
File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/loading.py", line 309, in populate_state
populator(state, dict_, row)
File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/strategies.py", line 1419, in load_scalar_from_joined_existing_row
existing = _instance(row, None)
File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/loading.py", line 481, in _instance
populate_state(state, dict_, row, isnew, attrs)
File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/loading.py", line 309, in populate_state
populator(state, dict_, row)
File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/strategies.py", line 1401, in load_collection_from_joined_existing_row
_instance(row, result_list)
File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/loading.py", line 500, in _instance
result.append(instance)
File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/util/_collections.py", line 753, in append
self._data_appender(item)
File "/home/dobes/sqlalchemy/test/../lib/sqlalchemy/orm/collections.py", line 655, in append_without_event
self._data()._sa_appender(item, _sa_initiator=False)
AttributeError: 'NoneType' object has no attribute '_sa_appender'
I've attached a test case that reproduces the issue. Note that because the order of properties is significant, I did a bit of a hack to sort the properties.
Workaround:
What seems to be working for me is to specify eager loading for both collections - the one on the parent and the one on the child. For example to modify the query options in the attached test case like:
res = (session.query(A)
.join(A.b)
.join(b_parent, b_parent.b_id == B.parent_id)
.join(b_parent.z).filter(BC.value>0)
.options(joinedload('b').joinedload('z'))
.options(joinedload('b').joinedload('parent').joinedload('z')).all()
)
will not crash, it seems.
Comments (14)
-
reporter -
reporter - attached test_temp.py
Cleaned up the file a bit, removed unnecessary class C.
-
reporter - edited description
-
reporter - edited description
-
repo owner oh. what's this?
b.parent_id = b.b_id
yeah, that is going to create problems. we have no test coverage for this case and there's probably a lot of issues with it.
-
reporter - edited description
-
repo owner here's the minimal test case
from sqlalchemy import Column, Integer, ForeignKey, create_engine from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.orm import relation, joinedload, backref, \ Session, configure_mappers Base = declarative_base() class A(Base): __tablename__ = 'A' a_id = Column(Integer, primary_key=True) b_id = Column(Integer, ForeignKey('B.b_id')) b = relation('B') class B(Base): __tablename__ = 'B' b_id = Column(Integer, primary_key=True) parent_id = Column(Integer, ForeignKey('B.b_id')) z = relation('BC') parent = relation("B", remote_side='B.b_id') class BC(Base): __tablename__ = 'BC' bc_id = Column(Integer, primary_key=True) b_id = Column(Integer, ForeignKey('B.b_id')) bm = B.__mapper__ bmp = bm._props configure_mappers() # Bug is order-dependent, must sort the "z" property to the end bmp.sort() e = create_engine('sqlite://', echo=True) Base.metadata.create_all(e) session = Session(e) b = B() session.add(b) session.flush() b.parent_id = b.b_id b.z.append(BC()) b.z.append(BC()) session.commit() # If the bug is here, the next line throws an exception session.query(B).options(joinedload('parent').joinedload('z')).all()
-
reporter @zzzeek
Yes, indeed - the bug is triggered by a self-referential object being loaded, so the same object is being eagerly loaded twice, and with different eager loading options on its own properties.
Our use case is that every node in a certain directed graph has a reference to the root node - including the root node itself. We want to eagerly load some information off the root node regardless of where we are in the tree. Thus, I thought I could just set the "root" reference of the root node to itself and eagerly load that information through that reference, even though the node we're eagerly loading through might in fact be the root node. Often it is not.
The workaround is probably OK for us, but this issue may be useful documentation for that workaround for those who happen to get the same error and go searching for answers.
Or possibly the bug can be fixed, I'm not sure what would be the right fix, though, it's a bit of a complex piece of logic there.
-
repo owner yeah so to jot down what this is for me, we load B, then go to load B.parent, load B.parent, then go to load B.parent.zs, then put a BC in there, then go to load B.zs and oh, that's not in eager loading so lets zap B.zs, which breaks B.parent.zs, there's the conflict.
-
repo owner Here are thoughts on how this might be addressed.
diff --git a/lib/sqlalchemy/orm/collections.py b/lib/sqlalchemy/orm/collections.py index 698677a..46367b0 100644 --- a/lib/sqlalchemy/orm/collections.py +++ b/lib/sqlalchemy/orm/collections.py @@ -575,6 +575,13 @@ class CollectionAdapter(object): self._key = attr.key self._data = weakref.ref(data) self.owner_state = owner_state + + # this would be an easy way to link a collection to the + # operation in which it is loading. but this wont + # be meaningful within a populate_existing, so this may + # be useless. + # self.runid = owner_state.runid + self.link_to_self(data) def _warn_invalidated(self): diff --git a/lib/sqlalchemy/orm/state.py b/lib/sqlalchemy/orm/state.py index a9024b4..1d24297 100644 --- a/lib/sqlalchemy/orm/state.py +++ b/lib/sqlalchemy/orm/state.py @@ -326,12 +326,24 @@ class InstanceState(interfaces._InspectionAttr): self.manager.get_impl(key).initialize(self, self.dict) - def _reset(self, dict_, key): + def _reset(self, dict_, key): #, check_runid=False): """Remove the given attribute and any callables associated with it.""" old = dict_.pop(key, None) if old is not None and self.manager[key].impl.collection: + # it is highly possible that this codepath only normally + # occurs in a populate_existing scenario. If the collection + # is present here otherwise, then this object were already + # loaded and loading.py is only going to hit that which is + # "unloaded". + + # here's one way to just see if the collection there + # is the one we're loading right now. + #if check_runid and old._sa_adapter.runid == self.runid: + # dict_[key] = old + # return + self.manager[key].impl._invalidate_collection(old) self.callables.pop(key, None) @@ -356,6 +368,9 @@ class InstanceState(interfaces._InspectionAttr): def _set_callable(state, dict_, row): old = dict_.pop(key, None) if old is not None: + # is this a populate_existing only path as well? + # this occurs for the non-class-level lazy + # loader. impl._invalidate_collection(old) state.callables[key] = fn else: diff --git a/lib/sqlalchemy/orm/strategies.py b/lib/sqlalchemy/orm/strategies.py index 1e8020d..29b760a 100644 --- a/lib/sqlalchemy/orm/strategies.py +++ b/lib/sqlalchemy/orm/strategies.py @@ -620,6 +620,7 @@ class LazyLoader(AbstractRelationshipLoader): self, context, path, loadopt, mapper, row, adapter): key = self.key + if not self.is_class_level: # we are not the primary manager for this attribute # on this class - set up a @@ -635,6 +636,15 @@ class LazyLoader(AbstractRelationshipLoader): return set_lazy_callable, None, None else: + # this might be the best approach, as populate_existing + # / always_refresh + # are really super rare, just don't bother here if we aren't + # in that kind of operation. + populate_existing = context.populate_existing or \ + mapper.always_refresh + if not populate_existing: + return None, None, None + def reset_for_lazy_callable(state, dict_, row): # we are the primary manager for this attribute on # this class - reset its @@ -644,7 +654,10 @@ class LazyLoader(AbstractRelationshipLoader): # this is needed in # populate_existing() types of scenarios to reset # any existing state. - state._reset(dict_, key) + + # this is probably a very wasteful call when populate + # existing is not set!! + state._reset(dict_, key) # , check_runid=True) return reset_for_lazy_callable, None, None
-
repo owner - changed title to evaulate lazy loader's "reset" of a given collection; usually not necessary, plus can step on an eager loader
- changed milestone to 1.0
- marked as major
0.9 is way too late for a deep change like this so this has to be a 1.0 thing.
-
repo owner - changed status to resolved
- Made a small adjustment to the mechanics of lazy loading,
such that it has less chance of interfering with a joinload() in the
very rare circumstance that an object points to itself; in this
scenario, the object refers to itself while loading its attributes
which can cause a mixup between loaders. The use case of
"object points to itself" is not fully supported, but the fix also
removes some overhead so for now is part of testing.
fixes
#3145
→ <<cset fa5522547150>>
-
It seems to me that a variation of this case still rises an error . In my case I have the following Architecture:
A <--------(child-parent)- A <---(A.bs/B.as)--------> B -----(B.cs)-----> C |______________________(A.bs)_______________________^
So the query is likes this:
AWithChildren=aliased(A, name='second') query = session.query( B ).filter_by( id=B_id_variable ).join( A ).outerjoin( AWithChildren, A.self_referential_realtionship, ).join( AWithChildren.bs, aliased=True ).options( contains_eager(B.as). contains_eager(A.self_referential_relationship). joinedload(A.bs). joinedload(B.cs) )
My initial goal was to eliminate excessive queries on
C
when I manipulate query results and - given someb
- try to accessb.As.self_referential_relationship.bs.cs
In this way,B
references itself throughA
and I anyway get'NoneType' object has no attribute '_sa_appender'
-
repo owner Can you please open a new issue with a complete test case? thanks.
- Log in to comment