reentrant attribute access wtihin validators/AttributeExtension affects history accounting

Issue #1526 resolved
Former user created an issue

If we have a self-referential, many-to-many relation (e.g. network of nodes), when we define a validator on one side of the symmetrical relationship (namely children), append works or doesn't work correctly depending on whether the data is commited or not. Works fine if there's no validator or changes are commited at the very end of operation.

An obvious workaround is to change the append to append to the other side of the relationship (ie. parent.children.append instead of children.parents.append), so I've set the ticket priority to low. However, something is definitely wrong with the code and this can be a manifestation of some nastier bug, ergo the ticket.

Self-contained test case is attached. My setup: SQLAlchemy 0.5.5, Python 2.5 on win32

Comments (10)

  1. Mike Bayer repo owner

    backrefs don't fire off for unloaded collections, and it would be seriously inefficient if they did for many-to-many relations. every append() of an item would potentially result in a full load of a new collection of arbitrary length !

    I think this is more of a documentation issue - validators only apply to in-python operations, not "de facto" collection modifications that occur due to database operations resulting from a related collection in the other direction.

    the solution would be to validate on each side where you expect an append() operation in python to occur.

    the only thing bugging me here is that yes, if both sides are loaded, then both event handlers fire off. so this may be a behavioral change I'd want to set up for 0.6.0 - that @validator doesn't fire off for a backref operation (or, better yet, lets just add a flag to @validator("foo", enable_backrefs=False))

  2. Former user Account Deleted

    Now I'm confused. You say that backrefs don't fire off for unloaded collections. However, a validator defined for 'children' will fire (and work correctly) when doing a node.parents.append(). The test case demonstrates this (eg. trying to add a duplicate node will raise an exception).

    What doesn't work is that append() silently fails, and only when intermidiate commits() are made. In pseudocode:

    This works:

    create node1
    create node2 
    create node3
    node2.parents.append(node1)
    node3.parents.append(node2)
    commit
    #node1.children would now be [node3](node2,)
    

    But this doesn't:

    create node1
    commit
    create node2
    commit
    create node3
    commit
    node2.parents.append(node1)
    node3.parents.append(node2)
    commit
    #node1.children would now be [node3](node3)
    #node2 silently failed to be appended!
    

    But this works again:

    create node1
    commit
    create node2
    commit
    create node3
    commit
    #Accessing node1.children magically makes code work again
    assert node1.children == []
    node2.parents.append(node1)
    node3.parents.append(node2)
    commit
    
  3. Mike Bayer repo owner

    sorry, I read "doesn't work", saw that you were expecting it to work primarily from the reverse end, and assumed it meant your validator wasn't working as expected. But in fact you're hitting "self.children" which is loading the collection unconditionally. That is actually the source of the issue here, changing the state of the collection within the enclosing operation that is trying to do the same thing is resetting the "dirty" flag on the item and the collection isn't being flushed.

    While the use case pictured in this test case is a lot easier to achieve by just using a set(), it is a bug though I won't know if it's resolvable until I more closely pinpoint the mechanics of this condition.

  4. Former user Account Deleted

    The actual use case is a bit more complicated. The validator is used to prevent circulairty in the graph and therefore must traverse parents/children in a non-trivial way.

    However, as there is a workaround (in this test case, using {{{node1.children.append(node2)}}} instead of {{{node2.parents.append(node1)}}}), the issue is not too critical. I just wanted to point out the bug, to make sure its not a manifestation of some deeper problem.

  5. Mike Bayer repo owner

    Here's a patch that addresses at least trapping where the bookkeeping goes wrong:

    Index: lib/sqlalchemy/orm/attributes.py
    ===================================================================
    --- lib/sqlalchemy/orm/attributes.py    (revision 6320)
    +++ lib/sqlalchemy/orm/attributes.py    (working copy)
    @@ -700,6 +700,7 @@
             collection = self.get_collection(state, dict_, passive=passive)
             if collection is PASSIVE_NO_RESULT:
                 value = self.fire_append_event(state, dict_, value, initiator)
    +            assert self.key not in dict_, "Collection was loaded during event handling."
                 state.get_pending(self.key).append(value)
             else:
                 collection.append_with_event(value, initiator)
    @@ -711,6 +712,7 @@
             collection = self.get_collection(state, state.dict, passive=passive)
             if collection is PASSIVE_NO_RESULT:
                 self.fire_remove_event(state, dict_, value, initiator)
    +            assert self.key not in dict_, "Collection was loaded during event handling."
                 state.get_pending(self.key).remove(value)
             else:
                 collection.remove_with_event(value, initiator)
    

    I think this + the aforementioned enable_backrefs=False flag to avoid having the error be raised is at least the way to go for now. there are ways the bookkeeping can try to "adjust" inside the block there but the sequence of events gets a little more intricate.

  6. Former user Account Deleted

    I confirm that the changes throw an assert where append() used to silently fail.

    I think that it's better for validators not to fire for the backref side of the relation, since it's redundant. Then the programmer would explicitly need to use {{{enable_backrefs=True}}} if he really knows what he's doing (and the funcionality has been fully implemented, which at the moment it isn't for self-referential relations).

    Since this is a lower-priority issue, maybe the following course of action would be the best:

    1. Disble firing validators on the backref side of a relation.
    2. Explicitly note this in the section on validators in the documentation.
    3. One day, when a clear use case is found and the funcionality has been implemented correctly, add enable_backrefs flag, False by default. Setting it to True would cause the backref validator to be fired also.

    Attached: a test case that fails when backref validators are fired and passes when only the correct validator is fired.

  7. Log in to comment