narrow scope around "get committed" attributes

Issue #2350 resolved
Mike Bayer repo owner created an issue

another flag which frames this just around "history" operations, as in the attached patch, maintains normal ORM behavior within flush extensions. Pick a better name than "magic".

diff -r 6ed0d7e3339dd14daad612f9ac6dd2f5e90ef517 lib/sqlalchemy/orm/session.py
--- a/lib/sqlalchemy/orm/session.py Thu Dec 15 11:42:50 2011 -0500
+++ b/lib/sqlalchemy/orm/session.py Thu Dec 15 19:14:04 2011 -0500
@@ -525,6 +525,7 @@
         self.bind = bind
         self.__binds = {}
         self._flushing = False
+        self._magic = False
         self.transaction = None
         self.hash_key = _new_sessionid()
         self.autoflush = autoflush
diff -r 6ed0d7e3339dd14daad612f9ac6dd2f5e90ef517 lib/sqlalchemy/orm/strategies.py
--- a/lib/sqlalchemy/orm/strategies.py  Thu Dec 15 11:42:50 2011 -0500
+++ b/lib/sqlalchemy/orm/strategies.py  Thu Dec 15 19:14:04 2011 -0500
@@ -410,7 +410,7 @@
         # for this state.

         sess = sessionlib._state_session(state)
-        if sess is not None and sess._flushing:
+        if sess is not None and sess._magic:
             def visit_bindparam(bindparam):
                 if bindparam._identifying_key in bind_to_col:
                     bindparam.callable = \
@@ -488,7 +488,7 @@
         # if we have a simple primary key load, check the 
         # identity map without generating a Query at all
         if self.use_get:
-            if session._flushing:
+            if session._magic:
                 get_attr = instance_mapper._get_committed_state_attr_by_column
             else:
                 get_attr = instance_mapper._get_state_attr_by_column
diff -r 6ed0d7e3339dd14daad612f9ac6dd2f5e90ef517 lib/sqlalchemy/orm/unitofwork.py
--- a/lib/sqlalchemy/orm/unitofwork.py  Thu Dec 15 11:42:50 2011 -0500
+++ b/lib/sqlalchemy/orm/unitofwork.py  Thu Dec 15 19:14:04 2011 -0500
@@ -159,31 +159,34 @@
         # prevents newly loaded objects from being dereferenced during the
         # flush process

-        if hashkey in self.attributes:
-            history, state_history, cached_passive = self.attributes[hashkey](hashkey)
-            # if the cached lookup was "passive" and now 
-            # we want non-passive, do a non-passive lookup and re-cache
-            if cached_passive is not attributes.PASSIVE_OFF \
-                and passive is attributes.PASSIVE_OFF:
+        self.session._magic = True
+        try:
+            if hashkey in self.attributes:
+                history, state_history, cached_passive = self.attributes[hashkey](hashkey)
+                # if the cached lookup was "passive" and now 
+                # we want non-passive, do a non-passive lookup and re-cache
+                if cached_passive is not attributes.PASSIVE_OFF \
+                    and passive is attributes.PASSIVE_OFF:
+                    impl = state.manager[key](key).impl
+                    history = impl.get_history(state, state.dict, 
+                                        attributes.PASSIVE_OFF)
+                    if history and impl.uses_objects:
+                        state_history = history.as_state()
+                    else:
+                        state_history = history
+                    self.attributes[hashkey](hashkey) = (history, state_history, passive)
+            else:
                 impl = state.manager[key](key).impl
-                history = impl.get_history(state, state.dict, 
-                                    attributes.PASSIVE_OFF)
+                # TODO: store the history as (state, object) tuples
+                # so we don't have to keep converting here
+                history = impl.get_history(state, state.dict, passive)
                 if history and impl.uses_objects:
                     state_history = history.as_state()
                 else:
                     state_history = history
                 self.attributes[hashkey](hashkey) = (history, state_history, passive)
-        else:
-            impl = state.manager[key](key).impl
-            # TODO: store the history as (state, object) tuples
-            # so we don't have to keep converting here
-            history = impl.get_history(state, state.dict, passive)
-            if history and impl.uses_objects:
-                state_history = history.as_state()
-            else:
-                state_history = history
-            self.attributes[hashkey](hashkey) = (history, state_history, passive)
-
+        finally:
+            self.session._magic = False
         return state_history

     def has_dep(self, processor):

another way to go would be to elaborate upon the "PASSIVE" flag system. There's six of those now and I wonder if more of a bitmask approach would make it more flexible and easy to understand.

Comments (19)

  1. Former user Account Deleted

    How about "_get_committed" instead of "_magic"?

    Are we ''always'' flushing from get_attribute_history() or should the line you added:

    self.session._magic = True
    

    instead be:

    self.session._magic = self.session._flushing
    

    The passive_updates=False example makes it clear why the uow needs the history, thanks. For my understanding, do we need the persistent relationship value for anything ''besides'' history?

    My concern now that you've shown me the one to many example and a patch how to fix my problem is this: Can one make an argument that one really ''is'' interested in the committed relationship because they know that passive_updates is True and the database isn't going to update the foreign keys? What about many to many?

    Take this familiar example:

    u1 = User(username='jack', fullname='jack')
    u1.addresses = [Address(email='jack@gmail.com')](Address(email='jack@yahoo.com'),)
    sess.add(u1)
    sess.commit()
    
    u1.username = 'ed'
    

    Here is the philosophical question:[BR] From within before_update(), what do we now expect when we reference instance.addresses?

    Do we expect `[Address(email='jack@gmail.com')](Address(email='jack@yahoo.com'),)` or `[ Does the answer change for many to many?  Does the answer change depending upon 'addresses' `passive_updates` value?  If `passive_updates=True` and we know the database isn't going to change the fk, won't we expect [](]`?) after the flush?
    

    I think in the end, before_update() needs to act the same as sqlalchemy does when outside a flush, but I wanted to present these thoughts to you to digest before we break anything. In the end, I feel like, if the user wants the committed value, he should be using attributes.get_history instead of relying on how this currently works, agreed?

    I also feel that above all else, when we reference the relationship from within flush we should get the same answer as we will after the flush, agreed? (That's why I'm wondering about many to many, and passive_updates=True, etc.)

  2. Mike Bayer reporter

    Replying to guest:

    How about "_get_committed" instead of "_magic"?

    yeah something like that, not critical

    Are we ''always'' flushing from get_attribute_history() or should the line you added:

    yes that is a "history" function in unitofwork.py that wraps/caches around the regular attributes.get_history(). The entirety of unitofwork.py is only used within flush().

    The passive_updates=False example makes it clear why the uow needs the history, thanks. For my understanding, do we need the persistent relationship value for anything ''besides'' history?

    well that's what I wanted to test here and I think its really limited to dependency.py looking at relationships to figure out what should be flushed, and it looks at history for that. all the tests pass.

    My concern now that you've shown me the one to many example and a patch how to fix my problem is this: Can one make an argument that one really ''is'' interested in the committed relationship because they know that passive_updates is True and the database isn't going to update the foreign keys? What about many to many?

    we have some tests for many-to-many with primary key values changing....they pass. its all about demonstrating cases where the patch above is not enough. we could have every dependency.py operation frame itself in a flag like this if the scope needs to be expanded.

    Take this familiar example: {{{ u1 = User(username='jack', fullname='jack') u1.addresses = Address(email='jack@gmail.com'),) sess.add(u1) sess.commit()

    u1.username = 'ed' }}}

    Here is the philosophical question:[BR] From within before_update(), what do we now expect when we reference instance.addresses?

    you'd see those two Address objects. consider if you didn't flush the session at all and just looked at instance.addresses.

    unless you mean after u1.username = 'ed', which would let's say detach those two Address objects, and then another flush. Well yeah, what do we expect, within the flush itself - it's ambiguous. i think an extension should just see what it would see outside of the flush to as much a degree as possible. it can't be expected that foreign key/primary key manipulations would be reflected in collections within the flush. the collection above wouldn't change until expired anyway.

    Do we expect `[Address(email='jack@gmail.com')](Address(email='jack@yahoo.com'),)` or `[ Does the answer change for many to many?  Does the answer change depending upon 'addresses' `passive_updates` value?  If `passive_updates=True` and we know the database isn't going to change the fk, won't we expect [](]`?) after the flush?
    

    within the flush,none of these collections change at all. the flush just looks at them and what was there before. i dont know at the moment that we need to second guess this much.

    I think in the end, before_update() needs to act the same as sqlalchemy does when outside a flush, but I wanted to present these thoughts to you to digest before we break anything. In the end, I feel like, if the user wants the committed value, he should be using attributes.get_history instead of relying on how this currently works, agreed?

    i think if the user wanted the committed value the best thing would be to pass an option in like attributes.get_history(.., ..., GET_COMMITTED). All those PASSIVE_ flags would become like some kind of bitflag thing - (GET_COMMITTED | NO_FETCH | ...). Having six of those PASSIVE_ flags is disturbing to me.

    I also feel that above all else, when we reference the relationship from within flush we should get the same answer as we will after the flush, agreed?

    Well I think you would, because the flush doesn't change any data except for foreign key attributes. collections and object references are not modified in any way.

  3. Former user Account Deleted

    i think if the user wanted the committed value the best thing would be to pass an option in like attributes.get_history(.., ..., GET_COMMITTED). All those PASSIVE_ flags would become like some kind of bitflag thing - (GET_COMMITTED | NO_FETCH | ...). Having six of those PASSIVE_ flags is disturbing to me.

    Agree completely here. I've never tried really hard to understand them and possibly for the very reason that there were a bunch of them and I couldn't keep them straight. On the other hand, I suspect less than 1% of users know they exist in the first place, so certainly shouldn't be a high priority...

  4. Former user Account Deleted

    I'm not sure I've got the tests where you'd want them (or how, since I've got a lazy import, etc), but 2 of the 3 test cases I added fail until the rest of the patch is applied.

    Curious, this ticket is marked 0.8.0, but is there any reason not to include that sooner?

    If you don't have time at the moment to control these changes or don't intend to include until 0.8.0, is there any chance you could "approve" this and #2351 so I can patch them as my own sqlalchemy version? I'm hoping to get back to my project as soon as possible, but I don't want to patch in something that you don't end up adding or it will break later when I upgrade. (Ideally, I'd wait for you to commit the changes, but I don't want to pressure you to do that if you don't have time, so just a verbal "yes, they'll make it in there" would be very helpful.)

  5. Mike Bayer reporter

    Replying to kentbower:

    I'm not sure I've got the tests where you'd want them (or how, since I've got a lazy import, etc), but 2 of the 3 test cases I added fail until the rest of the patch is applied.

    the test here looks very nice

    Curious, this ticket is marked 0.8.0, but is there any reason not to include that sooner?

    my concern is if any existing extensions are relying upon the current 'get committed' behavior. would mean this patch is backwards incompatible. If there's some way I could not have to worry about that (i.e. nobody's doing it, wouldn't break because "?") that would help.

    If you don't have time at the moment to control these changes or don't intend to include until 0.8.0, is there any chance you could "approve" this and #2351 so I can patch them as my own sqlalchemy version? I'm hoping to get back to my project as soon as possible, but I don't want to patch in something that you don't end up adding or it will break later when I upgrade.

    The implementation for #2350 here will probably change such that we're doing the "flags" approach within the UOW method, i.e. relying on #2358. But the overall behavioral contract of "maintain lazyloader behvior within extensions" is most likely good to go. Especially with #2358 this becomes a nicely consistent system. #2358 is backwards compatible in 0.7 but is a bit destabilizing nonetheless, since we're up to 0.7.4 already it's looking more appropriate for an 0.8.

  6. Former user Account Deleted

    Replying to zzzeek:

    Curious, this ticket is marked 0.8.0, but is there any reason not to include that sooner?

    my concern is if any existing extensions are relying upon the current 'get committed' behavior. would mean this patch is backwards incompatible. If there's some way I could not have to worry about that (i.e. nobody's doing it, wouldn't break because "?") that would help.

    That's fine. I'll be careful about upgrading before 0.8.0.

  7. Mike Bayer reporter

    those are great tests, the cover both if you don't set the flag and if you don't un-set the flag. very nice. The latest patch uses a context manager since we'll be on 2.5.

  8. Former user Account Deleted

    Replying to zzzeek:

    those are great tests, the cover both if you don't set the flag and if you don't un-set the flag. very nice. The latest patch uses a context manager since we'll be on 2.5.

    [BR]

    Thanks, I typically feel "if it's worth doing at all, it's worth doing well."

    The last part of the patch (in the tests) would be even better like this:

            try:
                sess.flush()
            except AvoidReferencialError:
                pass
            else:
                raise Exception("Apparently event never invoked before_update()")
    

    That makes sure the event was properly listened to.

  9. Mike Bayer reporter

    let's get rid of the use of _flushing as well as _get_committed and just send a new bitflag LOAD_AGAINST_COMMITTED when we call upon history in get_attribute_history().

  10. Mike Bayer reporter

    much better. the same tests should still be fine, and now anybody can load history using "committed" attributes.

  11. Former user Account Deleted

    Replying to zzzeek:

    much better. the same tests should still be fine, and now anybody can load history using "committed" attributes.

    Very good, thanks for all your work.

  12. Log in to comment