Related objects are not reloaded when using contains_eager

Issue #4021 closed
Boris Serebrov
created an issue

I have a query like this (where Author can have many Books):

query = session.query(Author)
query = query.join(Author.books)
query = query.filter(Book._id == book1._id)
query = query.options(contains_eager(Author.books))
author = query.one()

It works as expected, I get the author object with one related book (book1).

Now, if I run same query, but filter it by book2._id, I get the author object which still has book1 in the author.books.

Tracing the issue through the SQLAlchemy code, I found a workaround - I can use populate_existing() and then it works. But it doesn't look like a good solution, I feel like it should work without it (or that there is a better workaround - is there a way to instruct SQLAlchemy that specific relation need to be reloaded?).

The complete example:

from sqlalchemy import __version__
from sqlalchemy import create_engine, ForeignKey, Column, Integer, String
from sqlalchemy.orm import relationship, backref, contains_eager, sessionmaker
from sqlalchemy.ext.declarative import declarative_base

ModelBase = declarative_base()


class Author(ModelBase):
    __tablename__ = 'authors'
    _id = Column(Integer, primary_key=True, nullable=False)
    name = Column(String(255))


class Book(ModelBase):
    __tablename__ = 'books'
    _id = Column(Integer, primary_key=True)
    name = Column(String)
    author_id = Column(Integer, ForeignKey('authors._id'))
    author = relationship(
        'Author', backref=backref('books'))


if __name__ == "__main__":
    print('SQLAlchemy version', __version__)
    engine = create_engine("sqlite://")
    ModelBase.metadata.create_all(engine)
    Session = sessionmaker(bind=engine)
    session = Session()

    author = Author(name="author1")
    session.add(author)
    session.commit()

    book1 = Book(name="book_a1_1", author_id=author._id)
    session.add(book1)
    book2 = Book(name="book_a1_2", author_id=author._id)
    session.add(book2)
    session.commit()
    print('book1', book1._id)
    print('book2', book2._id)
    print()

    query = session.query(Author)
    query = query.join(Author.books)
    query = query.filter(Book._id == book1._id)
    query = query.options(contains_eager(Author.books))
    print('Expected [book1], got', [x._id for x in query.one().books])
    assert query.one().books[0]._id == book1._id

    query = session.query(Author)
    # it works with populate_existing()
    # query = session.query(Author).populate_existing()
    query = query.join(Author.books)
    query = query.filter(Book._id == book2._id)
    query = query.options(contains_eager(Author.books))
    print('Expected [book2], got', [x._id for x in query.one().books])
    assert query.one().books[0]._id == book2._id

And the output:

SQLAlchemy version 1.1.11
book1 1
book2 2

Expected [book1], got [1]
Expected [book2], got [1]
Traceback (most recent call last):
  File "contains_eager.py", line 58, in <module>
    assert query.one().books[0]._id == book2._id
AssertionError

The second assert fails - there is still book1 in the related books property. I reproduced this initially with version 1.0.11, tested on 1.1.11 - it works the same.

Comments (5)

  1. Michael Bayer repo owner

    SQLAlchemy has your author in the identity map , so once present, doesn't overwrite attributes until the object is expired, you use populate_existing(), or the object disappears.

    give your example as is, the most fine-grained way to expire that one field is:

        query = session.query(Author)
        # it works with populate_existing()
        # query = session.query(Author).populate_existing()
        for author in query:
            session.expire(author, ["books"])
        query = query.join(Author.books)
        query = query.filter(Book._id == book2._id)
        query = query.options(contains_eager(Author.books))
        print('Expected [book2], got', [x._id for x in query.one().books])
        assert query.one().books[0]._id == book2._id
    

    or you could say session.expire_all(), or use populate_existing(), etc., the point is the object is in there because it was not garbage collected.

    this is all expected behavior no bug illustrated here

  2. Boris Serebrov reporter

    Thanks, I see your point, although as a library user I would expect it to work after the second query - the query itself is an explicit instruction to get "author" object with "book2" and the result of this query is incorrect (the relation contains book1). In a real application it is much less obvious to find the bug like this, as it is hard to keep in mind the order of all the queries or to predict this situation analytically.

    So probably it means that potentially every query with contains_eager also needs expire / populate_existing or something else to reset the previous state that could have been created by other queries. Which makes me think that it could happen automatically in the library.

    I traced the execution up to this point

         to_load = _populate_partial(
            context, row, state, dict_, isnew, load_path,
            unloaded, populators)
    

    Here I see that the SQL was executed and correct data (author and book2) is fetched, row = (1, 'author1', 2, 'book2', 1). Then, in _populate_partial there is this part:

        for key, populator in populators["eager"]:
            if key not in unloaded:
                populator(state, dict_, row)
    

    The the populator is executed for books relation and then I get here:

    def load_collection_from_joined_exec(state, dict_, row):
        _instance(row)
    

    Inside the _instance(row) it actually looks like the book2 instance is updated, but it doesn't get back to author object. Not sure, maybe this doesn't make sense, but I feel like there could be some simple change to detect the situation with contains_eager and update the relation.

    Anyway, please feel free to close this issue, I hope the example and your answer can be still useful as a reference if anyone hits into the same issue. And thank you for your amazing work!

  3. Michael Bayer repo owner

    Thanks, I see your point, although as a library user I would expect it to work after the second query - the query itself is an explicit instruction to get "author" object with "book2" and the result of this query is incorrect (the relation contains book1). In a real application it is much less obvious to find the bug like this, as it is hard to keep in mind the order of all the queries or to predict this situation analytically.

    The SQLAlchemy ORM does not erase any data that's loaded in the session unless explicitly told to do so. The general rationale for this is described at http://docs.sqlalchemy.org/en/latest/faq/sessions.html#i-m-re-loading-data-with-my-session-but-it-isn-t-seeing-changes-that-i-committed-elsewhere.

    The use of "contains_eager" to modify what a relationship() normally refers to doesn't quite fit into that model since the query is changing dynamically, but that is one of many reasons why this particular pattern, while nominally supported, is recommended against. See the red warning note here.

    There has been a small amount of work in the area of designing a more fluent form of "populate_existing()" that can overwrite data that wasn't changed. It is theoretically possible but brings up other potential edge cases and unintuitive scenarios - it's extremely high risk to modify the basic assumptions of a system that has essentially worked exactly the same way for almost 12 years. Usually when people are trying to work around these kinds of issues they usually find an easier/better way to do what they wanted without overruling the default contract of the Session.

  4. Log in to comment