propagate populate_existing to subqueryloader; add note to docs for populate_existing regarding autoflush

Issue #2497 resolved
Former user created an issue

In this script, populate_existing() is used to demonstrate that sometimes the order_by will be honored (and sometimes not) after collections that already exist in the session are requeried. May be a problem only with polymorphic inheritance?

populate_existing() may be unrelated since there is still an issue even without populate_existing(), but it manifests as ''neither'' collection being reordered (when the objects already existed in the session).

''I'm not even 100% sure you'd expect this to behave as I expect it to here, but since populate_existing() generally does work this out, I assumed it is intended to work always.''

Comments (15)

  1. Mike Bayer repo owner

    there's sort of a bug related here but not the one you think. Also the docs might be considered a bug as they gave you the wrong idea.

    Populate existing was designed under the assumption that you'd like to discard the values of existing attributes, including pending changes. As such, it disables autoflush. The feature is used within the get() function and assumes this behavior as well, so if I change that, as always, the test that fails is one for the load_on_pending feature that was made just for you and you're the only person using ! :) (I'm not really sure why this is, I'd have to spend an hour tracing through it as always).

    You'll note that if you change your options in any way for that last query, like run it with no options, or with joinedload, the whole thing fails, because Role(C) is being removed from its parent and its autoflush fails due to FK null.

    So my first suggestion would be A. dont use populate_existing(), this method was added in like, version 0.1 or 0.2 eons before we knew how to expire things, second is B. just flush() beforehand.

    The fix needed here is to propagate populate existing within the subqueryloader:

    diff -r cd676cd8dfbddf1440849f96fa0cdabd4fd72497 lib/sqlalchemy/orm/strategies.py
    --- a/lib/sqlalchemy/orm/strategies.py  Fri Jun 01 16:31:00 2012 -0400
    +++ b/lib/sqlalchemy/orm/strategies.py  Wed Jun 06 11:41:10 2012 -0400
    @@ -897,6 +897,8 @@
             # these will fire relative to subq_path.
             q = q._with_current_path(subq_path)
             q = q._conditional_options(*orig_query._with_options)
    +        if orig_query._populate_existing:
    +            q._populate_existing = orig_query._populate_existing
             return q
    
         def _setup_outermost_orderby(self, q):
    

    with that change, let's see what we get, noting a few extra print statements in your test:

        print "Updated Manager roles:\n" + "\n".join(r.role for r in e.roles)
        print "Updated Engineer roles:\n" + "\n".join(r.role for r in e.staff[0](0).roles)
        print "session new:", session.new
        print "session dirty:", session.dirty
    

    and we get:

    Updated Manager roles:
    role A
    role F
    Updated Engineer roles:
    role B
    role K
    session new: IdentitySet([object at 0x1014ec310>, <__main__.Role object at 0x1014ec450>](<__main__.Role))
    session dirty: IdentitySet([])
    

    perfecto ! then a flush:

    sqlalchemy.exc.IntegrityError: (IntegrityError) roles.employee_id may not be NULL u'INSERT INTO roles (role) VALUES (?)' ('role C',)
    

    excellente ! so flush first, but also try not to need this feature, it's another one I'd love to zap in 0.8.

  2. Former user Account Deleted

    It still seems to me there is a problem because if you take my original script, add a session.flush() and then completely remove the populate_existing() line, the assertion of order_by still fails. Shouldn't it reorder collections? (it only works if I expunge_all() first).

    My problem isn't actually related to the flush() because in my real app, I am issuing a flush() (I just overlooked adding it to the script.) (In fact, my subclassed query's version of populate_existing() always issues a flush. My use case for populate_existing() is that I need to retain the existing objects because they have extra information added to them that isn't in the database which needs to be serialized back the client. Another one of my use cases for populate_existing() we discussed many moons ago is a secondary query().with_lockmode('update'). In other words, after some business logic, you realize you need to refresh an object's data and lock for update at the same time. populate_existing() fits well there. Is there another way around using populate_existing() for these use cases?)

    P.S. How am I doing so far in my quest to make sure absolutely every feature of sqlalchemy works in tandem? (Just kidding, but I imagine you think that's my quest.)

  3. Mike Bayer repo owner

    Replying to kentbower:

    It still seems to me there is a problem because if you take my original script, add a session.flush() and then completely remove the populate_existing() line, the assertion of order_by still fails. Shouldn't it reorder collections? (it only works if I expunge_all() first).

    not at all - flush() just emits SQL to the database. Doesn't expire anything. would be a performance nightmare if it did.

    My problem isn't actually related to the flush() because in my real app, I am issuing a flush() (I just overlooked adding it to the script.) (In fact, my subclassed query's version of populate_existing() always issues a flush. My use case for populate_existing() is that I need to retain the existing objects because they have extra information added to them that isn't in the database which needs to be serialized back the client. Another one of my use cases for populate_existing() we discussed many moons ago is a secondary query().with_lockmode('update'). In other words, after some business logic, you realize you need to refresh an object's data and lock for update at the same time. populate_existing() fits well there. Is there another way around using populate_existing() for these use cases?)

    use orderinglist, it maintains the order of things in memory. since that's what you're looking for here anyway.

  4. Former user Account Deleted

    Replying to zzzeek:

    Replying to kentbower:

    It still seems to me there is a problem because if you take my original script, add a session.flush() and then completely remove the populate_existing() line, the assertion of order_by still fails. Shouldn't it reorder collections? (it only works if I expunge_all() first).

    not at all - flush() just emits SQL to the database. Doesn't expire anything. would be a performance nightmare if it did.

    I didn't mean flush(), I meant the query that follows:

        print "Manager roles:\n" + "\n".join(r.role for r in e.roles)
        print "Engineer roles:\n" + "\n".join(r.role for r in e.staff[0](0).roles)
    
        session.flush()
    
        e = session.query(Employee)\
            .options(
                    subqueryload(Employee.staff),
                    subqueryload(Employee.roles),
                    subqueryload(Employee.staff,Employee.roles)
                ).filter_by(employee_id=1).one()
    
        print "Updated Manager roles:\n" + "\n".join(r.role for r in e.roles)
        print "Updated Engineer roles:\n" + "\n".join(r.role for r in e.staff[0](0).roles)
    

    Wouldn't you expect that query (with populate_existing() removed) to order the collections correctly, or would that be only if you expired the collection?

  5. Mike Bayer repo owner

    Replying to kentbower:

    {{{ #!python print "Manager roles:\n" + "\n".join(r.role for r in e.roles) print "Engineer roles:\n" + "\n".join(r.role for r in e.staff0.roles)

    session.flush()
    
    e = session.query(Employee)\
        .options(
                subqueryload(Employee.staff),
                subqueryload(Employee.roles),
                subqueryload(Employee.staff,Employee.roles)
            ).filter_by(employee_id=1).one()
    
    print "Updated Manager roles:\n" + "\n".join(r.role for r in e.roles)
    print "Updated Engineer roles:\n" + "\n".join(r.role for r in e.staff[0](0).roles)
    

    }}}

    Wouldn't you expect that query (with populate_existing() removed) to order the collections correctly,

    not at all, the session is always pushing changes in memory back out to the database, and assumes the transaction is isolated. So there is no need to take the vastly expensive and complex step of rebuilding all collections. Especially if autoflush is off, and the collection contains pending data that wasn't flushed, how would that even work ?

    as I said, orderinglist is provided for this use case.

  6. Former user Account Deleted

    Replying to zzzeek:

    Especially if autoflush is off, and the collection contains pending data that wasn't flushed, how would that even work ?

    Yeah, at one point roughly that thought went through my head. Sounds good. Thanks for the help with this. (And thanks for pointing to orderinglist, which may come in useful at some point.)

    Where in the test source would you like a test case, and is the script provided a good basis for a test case, even though it only checks a ''symptom'' of populate_existing not being propagated?

    (I'm not sure how to explicitly check that populate_existing is being propagated to subquery loader.)

  7. Former user Account Deleted

    There is likely a more direct test that populate_existing is propagated, but this test catches it indirectly while testing a bunch of other functionality

  8. Mike Bayer repo owner

    great..just some thoughts - does the test case really need to use inheritance and multiple relationships ? I would think we could test this with a simple Parent->Child scenario.

  9. Former user Account Deleted

    Replying to zzzeek:

    great..just some thoughts - does the test case really need to use inheritance and multiple relationships ? I would think we could test this with a simple Parent->Child scenario.

    Depends on your philosophy: * unit tests should be strictly centered around only one single aspect (at the expense of needing many many more tests for the same coverage) or * this type of coverage is to be encouraged because it may cover a case of polymorphic subquery loading that would be hit by no other currently written test case

    If it were my project, I'd lean towards the later cuz I'd want as rich and as broad base of tests as possible, even though it would make it a little more cumbersome if I ever wanted to completely rip out polymorphic inheritance. But if you lean towards the former, let me know if you would like me to refactor that as a simple parent child case. (When I first wrote the script, I wasn't sure where the problem was yet.)

  10. Mike Bayer repo owner

    Well looking through our tests you can see that for a long time we went with approach B, but now that Ive been writing tests in Python for many years I put a lot more effort into A. Tests that are in this form are ultimately much more useful. When a code change causes tests to fail, finely-focused tests are informative to a specific degree what's failing. A heavy multi-function test OTOH is more difficult to debug when something makes it fail.

    Ultimately the tests are attempting to define SQLAlchemy's contract of usage and building it up from a large number of specific points is a lot more informative than from coarse-grained usage examples. Of course we have a lot of those too, though.

  11. Log in to comment