evaluate "dont call backrefs during merge" behavior

Issue #2221 resolved

Mike Bayer repo owner created an issue 2011-07-14

test below will pass if the given patch is applied. would like to see if there is a way to make this behavior available, either in all cases, or somehow controllable. The patch is probably intended to save on backref loads. A test which illustrates that this actually happens should be devised.

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()

class A(Base):
    __tablename__ = 'a'

    id = Column(Integer, primary_key=True)
    bs = relationship("B",backref="a")

class B(Base):
    __tablename__ = 'b'

    id = Column(Integer, primary_key=True)
    a_id = Column(Integer, ForeignKey('a.id'))

e = create_engine('sqlite://', echo=True)
Base.metadata.create_all(e)

s = Session(e)

a = A(id=1, bs=[B(id=2)](B(id=1),))
s.add(a)
s.commit()

s = Session(e)
a = A(id=1, bs=[B(id=2)](B(id=1),))
a2 = s.merge(a)

# comment out lines 737-741 of orm/properties.py
# to have these pass
assert 'a' in a2.__dict__['bs']('bs')[0](0).__dict__
assert 'a' in a2.__dict__['bs']('bs')[1](1).__dict__




#!diff
diff -r 7bb2adfff90f989d45cbcb6f306d8d9d82298816 lib/sqlalchemy/orm/properties.py
--- a/lib/sqlalchemy/orm/properties.py  Tue Jul 12 19:34:25 2011 -0400
+++ b/lib/sqlalchemy/orm/properties.py  Thu Jul 14 18:32:43 2011 -0400
@@ -733,11 +733,12 @@
                     dest_state,
                     dest_dict, 
                     load, _recursive):
-        if load:
+
+#        if load:
             # TODO: no test coverage for recursive check
-            for r in self._reverse_property:
-                if (source_state, r) in _recursive:
-                    return
+#            for r in self._reverse_property:
+#                if (source_state, r) in _recursive:
+#                    return

         if not "merge" in self.cascade:
             return

begin_testing_2221.patch

Comments (13)

Former user Account Deleted
See also https://groups.google.com/group/sqlalchemy/browse_thread/thread/2d6a9b26a08efc6?hl=en

Please add kb at retail architects as cc:
- 2011-07-15T08:06:15+00:00
Mike Bayer reporter
- changed watchers to kb@retailarchitects.com
f5f49f50c8d3668e50771f4200e6a92cd3c35af3
- 2011-08-12T22:50:14+00:00
Mike Bayer reporter
f5f49f50c8d3668e50771f4200e6a92cd3c35af3 adds a profiling test intended to illustrate the rationale for the "don't merge the reverse" check. Removing it adds over 10% method call overhead as well as 25% more SELECT statements (doing a grep for SELECT when this test runs yields 196 vs. 245 with the check removed). The check is only enabled in the "load" case, i.e. when we're willing to do SQL, so the "load from an offline cache" use case where load=False is also not affected here. So I'm pretty -1 on changing anything here, merge() isn't a "data population" routine when load=True, it's meant to merge changes into an existing Session.

Will leave it open for a bit if any more discussion is warranted !
- 2011-08-12T22:56:55+00:00
Former user Account Deleted
Those are interesting measurements. Thanks for looking at that. It was only causing grief for me with cases where the primary join is strange, meaning that "get()" won't be used for the backref population once it is referenced. (Another layer of complexity caused me to manually deal with this relationship anyway.)

It would have been nice as a general case utility, but given the extra overhead, my vote is also leave it alone.

...The possible enhancement (and the case that prompted my original appeal) is for the ''specific case when the backref of the property we are merging is the "ONE" side of a "... TO ONE" relationship.'' In that specific case, there is no need for a SELECT statement at all because we already have the object that needs to be set as the backref (in fact, we are in the process of merging that object)... so we could just set that backref with basically zero overhead.

Thoughts about that? (I don't necessarily even need that in this particular case but it wouldn't be wrong and may be beneficial in certain cases...)
- 2011-08-15T09:44:09+00:00
Mike Bayer reporter
- changed milestone to 0.7.4
perhaps the change would be more along the lines of, if the relationship is in the "we've already seen it" cache, we hit it with the "passive" flags so that no SQL is emitted. haven't tried it, just a thought.

going to push this to 0.7.4 for the moment.
- 2011-08-22T10:33:43+00:00
Mike Bayer reporter
- marked as enhancement
- 2011-08-22T10:34:46+00:00
Former user Account Deleted
This needn't be 'high' priority... it is rare that this would cause and issue and I've needed to work around my specific case where it was causing an issue anyway. I think it would be very nice, but don't think it is high priority. (My concern bigger than performance was in my case where autoflush was off so I was losing data changes.)
- 2011-08-22T10:39:48+00:00
Former user Account Deleted
Replying to comment from ~~#2264~~:
When merging a one to many relationship, since its backref is a single object and we already know what it is, is that automatically set now? (Or still concerned about circular references when serializing?)
that sounds like ~~#2221~~ to me, which is just an optimization - to copy relationships in both directions rather than waiting for a SQL load later on. As it de-optimizes other cases, it remains to be seen how that one will pan out. Depends on which case is more common as well as if the de-optimizing case can be ameliorated. ''snip...''
I think I'd also vote against performance decreases... However, I'm still not clear how this single case (merging a one to many relationship with a backref) could possibly decrease performance... it seems like a clear win/win: Namely, if the relationship being merged is one to many or one to one and has a backref, then we should just set that backref to the object we know it is (e.i. the instance we are currently merging).

Does the performance hit have to do with looking up history?
- 2011-11-02T11:09:15+00:00
Mike Bayer reporter
The optimization basically cuts off traversal into the "other side" of any relationship that's already been merged in one direction. So for all those cases where the "other side" is not a many-to-one pull from the identity map, skipping the optimization will emit SQL that otherwise would not.

I was playing with the above patch trying to figure out ways to make it "smart" about when to traverse the "other" direction but didn't get anything definitive, also didn't yet get test cases that enumerate each of the potential behaviors - so generally the mechanics of the traversal need to be better understood here.
- 2011-11-02T11:18:57+00:00
Mike Bayer reporter
Can you remind me why this behavior is needed at all ? Took a look and, we're dealing with many-to-ones. these load from the identity map anyway so what's the issue exactly? Below, we blow away the database after the merge. The collections and the many-to-ones are all there, no SQL is emitted:
```
s = Session(e)
a = A(id=1, bs=[B(id=2)](B(id=1),))
a2 = s.merge(a)

e.dispose()

assert a2.bs[0](0).a is a2
assert a2.bs[1](1).a is a2
```
- 2011-11-06T12:55:52+00:00
Former user Account Deleted
Replying to zzzeek:

Can you remind me why this behavior is needed at all ? Took a look and, we're dealing with many-to-ones. these load from the identity map anyway so what's the issue exactly? Below, we blow away the database after the merge. The collections and the many-to-ones are all there, no SQL is emitted: ...

I've worked around my specific problem programmatically, so I technically don't need this ticket... Having said that there may be future cases where someone's (or my) project would benefit from this ticket without a workaround. In summary, this was the reason for my specific case:

''...95% of the time it doesn't matter because, as you point out, the object is in the identity map. The problem I ran into is one where the primary join is goofy and therefore get() could not be utilized for grabbing the backref... instead it had to refetch it from the database (and it seemed silly to me since it knew the backref at merge() time). (Plus, it was worse in my case because there are a few points where I need to transiently turn off autoflush, and this was during one of them, so it was losing data changes I think when it looked up the backref object.)''

You added this ticket in response to a groups thread. There you explained your thoughts about opening this ticket...
- 2011-11-07T09:10:28+00:00
Mike Bayer reporter
- changed status to wontfix
OK so take a look at the tests in 81ba28d4222b469a9b714fb8f8c0b2165137ee63 . I'm trying pretty hard to find a case where removing that check makes things faster. At the moment, I can find none. Removing the check from merge() makes these new tests perform either the same or worse - none improve.

If you can figure out a test to add to that suite which illustrates less SQL being emitted, then I can start to think about this again - perhaps the reverse check can detect many-to-ones, set "passive loading" flags on so that the "set" does not emit SQL, etc., but so far I'm not seeing how that complexity and destabilization helps vs. just skipping "reverse" directions we've already done. In particular, just the "set" event on the collection side as one would expect "sets" the backref for you already, at least that's what test_pending_access_one seems to say and what my pdb's seemed to indicate. It's an intricate series of steps so my focus is perhaps only at 85% today, in the absence of an actual example of a performance increase.

So closing this for now but we can reopen if better data becomes available.
- 2011-11-13T14:35:02+00:00
Mike Bayer reporter
- removed milestone
Removing milestone: 0.7.4 (automated comment)
- 2014-02-23T18:14:53+00:00
Log in to comment

Assignee: Mike Bayer

Type: enhancement

Priority: critical

Status: resolved

Component: orm

Milestone: –

Votes: 0

Watchers: 0