`AttributeEvents.remove` has inconsistent timing

Issue #4134 new
Jack Zhou
created an issue

AttributeEvents.set and AttributeEvents.append both fire before the underlying collection has been changed, but the behavior for AttributeEvents.remove is not consistent, both between different methods of a single collection, as well as between the same methods of different collection types.

Demo:

class Department(Base):
    __tablename__ = "departments"
    id = Column(Integer, primary_key=True)
    employees = relationship(lambda: Employee,
                             cascade="all, delete-orphan")
    executives = relationship(lambda: Executive,
                              collection_class=attribute_mapped_collection("type"),
                              cascade="all, delete-orphan")
    contractors = relationship(lambda: Contractor, collection_class=set,
                               cascade="all, delete-orphan")


class Executive(Base):
    __tablename__ = "executives"
    id = Column(Integer, primary_key=True)
    type = Column(String, nullable=False)
    department_id = Column(Integer, ForeignKey(Department.id), nullable=False)


class Employee(Base):
    __tablename__ = "employees"
    id = Column(Integer, primary_key=True)

    department_id = Column(Integer, ForeignKey(Department.id), nullable=False)


class Contractor(Base):
    __tablename__ = "contractors"
    id = Column(Integer, primary_key=True)

    department_id = Column(Integer, ForeignKey(Department.id), nullable=False)


def _set_up_event(attr, name):
    @event.listens_for(attr, name, raw=True)
    def _check_if_modified(target, value, initiator):
        print(attr, name, target.attrs[initiator.key].history.has_changes())


session = Session()
emp1 = Employee()
emp2 = Employee()
exec1 = Executive(type="finance")
exec2 = Executive(type="operations")
con1 = Contractor()
con2 = Contractor()
dept = Department(employees=[emp1, emp2],
                  executives={"finance": exec1, "operations": exec2},
                  contractors={con1, con2})
session.add(dept)
session.commit()

for attr in [Department.employees, Department.contractors,
             Department.executives]:
    for name in ["remove", "append"]:
        _set_up_event(attr, name)

# list
print("remove()")
dept.employees.remove(emp1)
session.rollback()

print("pop()")
dept.employees.pop()
session.rollback()

print("clear()")
dept.employees.clear()
session.rollback()

emp3 = Employee()
print("[0] = ...")
dept.employees[0] = emp3
session.rollback()

print("del [0]")
del dept.employees[0]
session.rollback()

print("del [0:2]")
del dept.employees[0:2]
session.rollback()

print("[0:2] = ...")
dept.employees[0:2] = [emp3]
session.rollback()

# set
print("discard()")
dept.contractors.discard(con1)
session.rollback()

print("remove()")
dept.contractors.remove(con1)
session.rollback()

print("pop()")
dept.contractors.pop()
session.rollback()

print("clear()")
dept.contractors.clear()
session.rollback()

print("-=")
dept.contractors -= {con1, con2}
session.rollback()

con3 = Contractor()
print("&=")
dept.contractors &= {con3}
session.rollback()

print("^=")
dept.contractors ^= {con1, con2}
session.rollback()

# dict
print("pop()")
dept.executives.pop("finance")
session.rollback()

print("popitem()")
dept.executives.popitem()
session.rollback()

print("clear()")
dept.executives.clear()
session.rollback()

exec3 = Executive(type="finance")
print('["finance"] = ...')
dept.executives["finance"] = exec3
session.rollback()

print('del ["finance"]')
del dept.executives["finance"]
session.rollback()

exec4 = Executive(type="operations")
print("update()")
dept.executives.update({"finance": exec3, "operations": exec4})
session.rollback()

The following table summarizes the current behavior:

method list set dict
remove after before -
clear before before* before
pop/popitem after after after
dict.pop - - before
discard - before -
__setitem__ before - before
__setitem__ (slice) before* - -
__delitem__ before - before
__delitem__ (slice) before - -
__isub__ - before* -
__iand__ - before* -
__ixor__ - before* -
update - - before*

* before* means the event fires before the removal of the current item, but after the removal of the previous item

Ideally they should all behave like either "before" or "before*" (but not both, for consistency). Technically it's not really a bug because the documentation doesn't prescribe any one particular behavior, but the inconsistency is annoying. For me personally, the motivating use case is being able to intercept all changes to a particular instance and act before any changes are made; currently my use case is always okay for append/set, but only sometimes for remove.

Comments (10)

  1. Michael Bayer repo owner

    so first observation is that those specific hooks that fire for "after" have the events broken up into a "before_remove" and a "remove" event, surrounding the removal action, and then the other hooks just call upon "remove" before the action. The mechanism of these hooks were written long before we had any idea we were going to be building a public API for them, and they were only intended to supply backrefs.

    Trying to move them, the reason they are like in the case of set.pop() and dict.popitem() is obvious; these methods remove a random item so we can't fire remove first because we don't have the object yet.

    For the pop() method on a list, it looks like the test suite includes user-defined classes that don't have any __getitem__ support, so there's no quick way to get the zeroth-item for the event. This could be a use case we drop, but since pop() is impossible anyway that might not be worth it.

    that leaves only the remove() method as doable:

    diff --git a/lib/sqlalchemy/orm/collections.py b/lib/sqlalchemy/orm/collections.py
    index 364cd8246..2dc7cf685 100644
    --- a/lib/sqlalchemy/orm/collections.py
    +++ b/lib/sqlalchemy/orm/collections.py
    @@ -1077,10 +1077,9 @@ def _list_decorators():
    
         def remove(fn):
             def remove(self, value, _sa_initiator=None):
    -            __before_delete(self, _sa_initiator)
    +            __del(self, value, _sa_initiator)
                 # testlib.pragma exempt:__eq__
                 fn(self, value)
    -            __del(self, value, _sa_initiator)
             _tidy(remove)
             return remove
    

    two assumptions in the tests need to change (for the better in both cases):

    diff --git a/test/orm/test_attributes.py b/test/orm/test_attributes.py
    index 80d8cdc04..45b54031d 100644
    --- a/test/orm/test_attributes.py
    +++ b/test/orm/test_attributes.py
    @@ -479,7 +479,7 @@ class AttributesTest(fixtures.ORMTest):
                 f1 = Foo()
                 hist('bars', True, f1.bars.append, b3)
                 hist('bars', True, f1.bars.append, b4)
    -            hist('bars', False, f1.bars.remove, b2)
    +            hist('bars', True, f1.bars.remove, b2)
                 hist('bar', True, setattr, f1, 'bar', b3)
                 hist('bar', True, setattr, f1, 'bar', None)
                 hist('bar', True, setattr, f1, 'bar', b4)
    diff --git a/test/orm/test_hasparent.py b/test/orm/test_hasparent.py
    index fd246b527..b910921a4 100644
    --- a/test/orm/test_hasparent.py
    +++ b/test/orm/test_hasparent.py
    @@ -135,9 +135,9 @@ class ParentRemovalTest(fixtures.MappedTest):
                 u1.addresses.remove, a1
             )
    
    -        # unfortunately, u1.addresses was impacted
    -        # here
    -        assert u1.addresses == []
    +        # u1.addresses no longer impacted here
    +        # after #4134
    +        assert u1.addresses == [a1]
    
             # expire all and we can continue
             s.expire_all()
    

    so...we can do remove(). For pop(), I don't know how, unless we a. invent our own "pop()" simulator or b. change how the event is called, e.g. send the value as DONTKNOW or something like that, but not sure how that is useful. Any suggestions?

  2. Michael Bayer repo owner

    before means the event fires before the removal of the current item, but after the removal of the previous item / Ideally they should all behave like either "before" or "before" (but not both, for consistency).

    what do you mean "not both"? I'm not seeing the inconsistency here. when we do something like __iand__, well here it is you can see:

            def __iand__(self, other):
                want, have = self.intersection(other), set(self)
                remove, add = have - want, want - have
    
                for item in remove:
                    self.remove(item)
                for item in add:
                    self.add(item)
                return self
    

    remove() and add() fire off the events. I don't see how "remove*" applies to the removal-only operations.

  3. Jack Zhou reporter

    What I mean by "not both" is that all methods should fire its set of append/remove events in one of the following ways:

    1. all at once, before the underlying collection is mutated at all, or
    2. in sequence, each after the previous mutation has occurred

    The fact that some methods use # 1 and some methods use # 2 is the inconsistency. The append event also suffers from the same issue, but to a lesser extent (based on my limited tests, it always uses # 2 across all collection types and methods, except list.__setslice__ on Python 2).

    I think the pop/popitem issue can be solved by just "simulating" the effects:

    foo = {1, 2, 3}
    foo.remove(next(iter(foo)))
    bar = {1: "a", 2: "b"}
    del bar[next(iter(bar))]
    

    I think this is okay because some methods like extend is already implemented in terms of append; we'd just be doing the same thing for a couple more methods.

  4. Michael Bayer repo owner

    The problem with all of these changes is that they kill performance, and only to suit this extremely rare use case. No longer being able to remove a huge set of items at once, rewriting pop to spin up iterators.

    There would need to be some event flag strict=True that invokes alternative collection methods that perform as slowly and complex as they have to to get completely perfect behavior.

  5. Michael Bayer repo owner

    What would be reasonable is, the event fires before the item is acted upon. But your requirement that inter-item events also act a certain way suggests you are digging into the collection for each event, and you probably should use a custom collection class instead.

  6. Jack Zhou reporter

    My specific use case only requires that remove events fire before mutation takes place; I don't need inter-item events to act a certain way and I'd be fine with the DONTKNOW solution.

    I was trying to explore what an "ideal" semantics would look like, but of course if performance is a concern then sacrifices have to be made; it's all about tradeoffs. :)

  7. Michael Bayer repo owner
    • changed milestone to 1.3

    definitely we can do remove(). I'm trying to find a clean way to do the set methods like intersection_update such that the event is called beforehand for all elements and that is proving to be difficult. Additionally, approaches for pop(), popitem() need to be researched.

    remove() could be a 1.2 thing as it does really stick out as an odd inconsistency, however all the rest of it needs more careful work and tests so I am pushing this out, as 1.2 needs to move to release soon.

  8. Log in to comment