- edited description
Related objects are not reloaded when using contains_eager
I have a query like this (where Author
can have many Book
s):
query = session.query(Author)
query = query.join(Author.books)
query = query.filter(Book._id == book1._id)
query = query.options(contains_eager(Author.books))
author = query.one()
It works as expected, I get the author
object with one related book (book1
).
Now, if I run same query, but filter it by book2._id
, I get the author
object which still has book1
in the author.books
.
Tracing the issue through the SQLAlchemy code, I found a workaround - I can use populate_existing()
and then it works. But it doesn't look like a good solution, I feel like it should work without it (or that there is a better workaround - is there a way to instruct SQLAlchemy that specific relation need to be reloaded?).
The complete example:
from sqlalchemy import __version__
from sqlalchemy import create_engine, ForeignKey, Column, Integer, String
from sqlalchemy.orm import relationship, backref, contains_eager, sessionmaker
from sqlalchemy.ext.declarative import declarative_base
ModelBase = declarative_base()
class Author(ModelBase):
__tablename__ = 'authors'
_id = Column(Integer, primary_key=True, nullable=False)
name = Column(String(255))
class Book(ModelBase):
__tablename__ = 'books'
_id = Column(Integer, primary_key=True)
name = Column(String)
author_id = Column(Integer, ForeignKey('authors._id'))
author = relationship(
'Author', backref=backref('books'))
if __name__ == "__main__":
print('SQLAlchemy version', __version__)
engine = create_engine("sqlite://")
ModelBase.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
author = Author(name="author1")
session.add(author)
session.commit()
book1 = Book(name="book_a1_1", author_id=author._id)
session.add(book1)
book2 = Book(name="book_a1_2", author_id=author._id)
session.add(book2)
session.commit()
print('book1', book1._id)
print('book2', book2._id)
print()
query = session.query(Author)
query = query.join(Author.books)
query = query.filter(Book._id == book1._id)
query = query.options(contains_eager(Author.books))
print('Expected [book1], got', [x._id for x in query.one().books])
assert query.one().books[0]._id == book1._id
query = session.query(Author)
# it works with populate_existing()
# query = session.query(Author).populate_existing()
query = query.join(Author.books)
query = query.filter(Book._id == book2._id)
query = query.options(contains_eager(Author.books))
print('Expected [book2], got', [x._id for x in query.one().books])
assert query.one().books[0]._id == book2._id
And the output:
SQLAlchemy version 1.1.11
book1 1
book2 2
Expected [book1], got [1]
Expected [book2], got [1]
Traceback (most recent call last):
File "contains_eager.py", line 58, in <module>
assert query.one().books[0]._id == book2._id
AssertionError
The second assert fails - there is still book1
in the related books
property.
I reproduced this initially with version 1.0.11
, tested on 1.1.11
- it works the same.
Comments (5)
-
reporter -
repo owner SQLAlchemy has your author in the identity map , so once present, doesn't overwrite attributes until the object is expired, you use populate_existing(), or the object disappears.
give your example as is, the most fine-grained way to expire that one field is:
query = session.query(Author) # it works with populate_existing() # query = session.query(Author).populate_existing() for author in query: session.expire(author, ["books"]) query = query.join(Author.books) query = query.filter(Book._id == book2._id) query = query.options(contains_eager(Author.books)) print('Expected [book2], got', [x._id for x in query.one().books]) assert query.one().books[0]._id == book2._id
or you could say session.expire_all(), or use populate_existing(), etc., the point is the object is in there because it was not garbage collected.
this is all expected behavior no bug illustrated here
-
reporter Thanks, I see your point, although as a library user I would expect it to work after the second query - the query itself is an explicit instruction to get "author" object with "book2" and the result of this query is incorrect (the relation contains book1). In a real application it is much less obvious to find the bug like this, as it is hard to keep in mind the order of all the queries or to predict this situation analytically.
So probably it means that potentially every query with
contains_eager
also needsexpire
/populate_existing
or something else to reset the previous state that could have been created by other queries. Which makes me think that it could happen automatically in the library.I traced the execution up to this point
to_load = _populate_partial( context, row, state, dict_, isnew, load_path, unloaded, populators)
Here I see that the SQL was executed and correct data (author and book2) is fetched,
row = (1, 'author1', 2, 'book2', 1)
. Then, in _populate_partial there is this part:for key, populator in populators["eager"]: if key not in unloaded: populator(state, dict_, row)
The the
populator
is executed forbooks
relation and then I get here:def load_collection_from_joined_exec(state, dict_, row): _instance(row)
Inside the
_instance(row)
it actually looks like thebook2
instance is updated, but it doesn't get back toauthor
object. Not sure, maybe this doesn't make sense, but I feel like there could be some simple change to detect the situation withcontains_eager
and update the relation.Anyway, please feel free to close this issue, I hope the example and your answer can be still useful as a reference if anyone hits into the same issue. And thank you for your amazing work!
-
repo owner Thanks, I see your point, although as a library user I would expect it to work after the second query - the query itself is an explicit instruction to get "author" object with "book2" and the result of this query is incorrect (the relation contains book1). In a real application it is much less obvious to find the bug like this, as it is hard to keep in mind the order of all the queries or to predict this situation analytically.
The SQLAlchemy ORM does not erase any data that's loaded in the session unless explicitly told to do so. The general rationale for this is described at http://docs.sqlalchemy.org/en/latest/faq/sessions.html#i-m-re-loading-data-with-my-session-but-it-isn-t-seeing-changes-that-i-committed-elsewhere.
The use of "contains_eager" to modify what a relationship() normally refers to doesn't quite fit into that model since the query is changing dynamically, but that is one of many reasons why this particular pattern, while nominally supported, is recommended against. See the red warning note here.
There has been a small amount of work in the area of designing a more fluent form of "populate_existing()" that can overwrite data that wasn't changed. It is theoretically possible but brings up other potential edge cases and unintuitive scenarios - it's extremely high risk to modify the basic assumptions of a system that has essentially worked exactly the same way for almost 12 years. Usually when people are trying to work around these kinds of issues they usually find an easier/better way to do what they wanted without overruling the default contract of the Session.
-
repo owner - changed status to closed
- Log in to comment