add detach / delete events, ensure full lifecycle / state tracking is possible

Issue #2677 resolved
Denis Otkidach created an issue

According to documentation after_attach event should be generated for add(), merge(), and delete() calls. But the last doesn't work. Here is a code sample to reproduce the problem:

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, create_engine, event
from sqlalchemy.orm import sessionmaker, Session

Base = declarative_base()

class Obj(Base):
    __tablename__ = 'objs'
    id = Column(Integer, primary_key=True)

engine = create_engine('sqlite://', echo=True)
Base.metadata.create_all(engine)
session = sessionmaker(bind=engine)()

class EventHandlerIsCalled(Exception): pass

obj = Obj()

session.add(obj)
session.commit()

@event.listens_for(Session, 'after_attach')
def after_attach(session, instance):
    raise EventHandlerIsCalled()

try:
    session.delete(obj)
except EventHandlerIsCalled:
    pass
else:
    assert False, 'after_attach event is not generated'

Comments (16)

  1. Mike Bayer repo owner

    ah. well actually, in that example the object is already attached, so the fact that the method isn't being called is not a bug. The documentation should instead be clarified. If you expunge(obj) first, then delete(), the method is called.

    It's clear that people really want just plain old "on mark as delete" so I'd rather add that separately. the event would probably benefit from an additional argument that states how the object got marked, as this can happen in several places (pre-flush cascade, during-flush cascade, etc).

  2. Denis Otkidach reporter

    Some more info that might be useful for your decision. The problem I'm solving is in-app denormalization with quite complex computational logic that is hard to do with SQL. To solve it we have to handle attribute changes, object addition and deletion.

    Changes: We can use custom descriptors. We wrap all relevant attributes (InstrumentedAttribute instances) with proxy descriptor class that does additional job when attribute is set. The set event is useless (or inconvenient at least) here since it's generated ''before'' attribute is set, so I can't run existing methods, but have to duplicate (rewrite) each of them for each attribute they depend on. Also I see no simple way to use my own subclass of InstrumentedAttribute.

    Addition: after_attach could be used here, but in real life all needed objects we are interested in have relation attribute set. So we just use the same descriptor wrapping described above.

    Removal: Right now I see no way to catch that Obj.o2m_rel looses some object and use it in some denormalization code: remove event is generated for one way of modifying only and before_delete and after_delete are too late to modify objects that are not in dirty state (parent object).

    It looks weird to me that I can't use events at all for the task that I believe should be rather solved with events.

  3. Mike Bayer repo owner

    so you don't actually care about attach or delete, you want attribute "after" events, is that all ?

    Right now I see no way to catch that Obj.o2m_rel looses some object and use it in some denormalization code: remove event is generated for one way of modifying only

    can you be more specific here? what's "one way" and what would be the "other way" ?

  4. Denis Otkidach reporter

    so you don't actually care about attach or delete, you want attribute "after" events, is that all ?

    Right now I need some way to catch Obj.o2m_rel changes (only through ORM methods, I don't care about direct SQL) to update denormalization counter field. Some obvious direct way is preferred, but it's ok to listen for some delete event on child class. Some sort of "detach" event would useful anyway.

    Also I'd like to use "after_set" in some ''other'' case, but I can live with hack of descriptor proxying I use right now. Actually it already breaks some code that automatically gathers info about fields (checks for isinstance(attr, InstrumentedAttribute)), but I believe I'll be able to fix it.

    Right now I see no way to catch that Obj.o2m_rel looses some object and use it in some denormalization code: remove event is generated for one way of modifying only

    can you be more specific here? what's "one way" and what would be the "other way" ?

    1. Obj.o2m_rel.remove(child) - I haven't checked this case, but assume we have "remove" event here - that's "one way" I referred to
    2. Obj.o2m_rel = [list of children...](...new) - no "remove" event, no "set" event, can be handled with custom descriptor
    3. child.parent = other_parent, child.parent = None - no "remove" event, can be handled with custom descriptor
    4. session.delete(child) - I failed to find any solution to handle this except "before_delete" or "after_delete" + delayed (unfortunately I see no way to do this with handling something like "after_flush_postexec", only session redefinition with some registry of delayed actions) + redefined flush() to do actual flush multiple times while dirty objects exist.
  5. Mike Bayer repo owner

    Replying to ods:

    can you be more specific here? what's "one way" and what would be the "other way" ?

    1. Obj.o2m_rel.remove(child) - I haven't checked this case, but assume we have "remove" event here - that's "one way" I referred to

    this fires a remove event.

    1. Obj.o2m_rel = [list of children...](...new) - no "remove" event, no "set" event, can be handled with custom descriptor

    not correct - this operation fires both remove and append events (but yes, before the fact)

    1. child.parent = other_parent, child.parent = None - no "remove" event, can be handled with custom descriptor

    this fires off the "set" event, and the old value is given.

    1. session.delete(child)

    adding an event for delete() is fine.

    do this with handling something like "after_flush_postexec", only session redefinition with some registry of delayed actions) + redefined flush() to do actual flush multiple times while dirty objects exist.

    this is what before_flush() is for, and it's where I usually do the kinds of things you're talking about. The problem with a "event after set" for the purpose of running a business method is that the event isn't after every aspect of the set operation. That is, if you did this:

    myobject.somecollection.append(otherobject)
    

    if that fires off an event that has the effect of appending/setting/removing another object somewhere else, then that operation has an event also. So if you had an "after set" handler that's triggered by a backref (which is "before set"), then your business methods are still not seeing the finished object structure - you're still ultimately inside of a "before set".

    Really, the traditional approach to running complex business methods when something happens is to define those use cases explicitly either in a service layer or within real methods on the object. The synonym() construct in particular is provided to allow a plain Python descriptor to wrap a database-mapped attribute, though in the spirit of KISS/explicit is better than implicit I'll really just do things the old fashioned way:

    class MyBusinessObject(Base):
       # ...
    
       def do_something_complicated(self, arg):
           # ...
    

    short of that, before_flush() is where you can get at objects in their "complete" state, and you can use history to see everything that's changed, not to mention look inside of session.delete and such.

  6. Denis Otkidach reporter
    1. Obj.o2m_rel = [list of children...](...new) - no "remove" event, no "set" event, can be handled with custom descriptor

    not correct - this operation fires both remove and append events (but yes, before the fact)

    It was my mistake, both remove and append work as advertised.

    short of that, before_flush() is where you can get at objects in their "complete" state, and you can use history to see everything that's changed, not to mention look inside of session.delete and such.

    Thank you very much for suggestion, but I failed to use "before_flush" event because objects ''are not'' in final state yet, that is Parent.children contain Child objects that are deleted and don't contain new Child objects yet. But this is already different story not related to the issue.

  7. Mike Bayer repo owner

    so what are we doing here? "after_XYZ()" events are not something I can just casually add, especially for collections. It would add lots of overhead that all users would be impacted with - note that we're talking about adding a bunch more boxes to the upper right of this: http://techspot.zzzeek.org/files/2010/sqla_070b1_large.png - the image illustrates that "attribute set/append" operations already take nearly 25% of runtime overhead for a mutation-heavy application.

    Right now you should be able to make a subclass of InstrumentedAttribute and install it using the "attribute_instrument" event (replacing the existing InstrumentedAttribute), or a custom InstrumentationManager (a little awkward to set up, but doable). Using "proxies" around descriptors shouldn't be needed. For collections, yes you need proxies right now.

    I'd like to think of some system where the entire way that orm/collections.py instruments a class is based on some open ended and easily modified scheme. This would be close to a rewrite of the whole system. But I don't have a nice vision of how that would look anyway right now, at least one that wouldn't add crushing method call overhead, which IMHO is already too much - I'd like to remove calls from collections.py and further simplify the codepaths, not add any.

  8. Denis Otkidach reporter

    The only clear to me for now is that SA missed "detach" event and it would be nice to clarify documentation (about "after_attach" and probably "before_flush"). The rest is not so obvious, your points are quite reasonable.

    I solved my task using "set", "append", "remove" to store a delayed action into a global (damn!) queue and executing them in the "before_flush" handler. This required reconstructing one-to-many relations before calculation manually with respect to session.new and session.deleted. And I had to attach "before_flush" listener to base Session class, which looks dirty, but is ok for this project because it has only one session.

  9. Mike Bayer repo owner

    for here I would like to:

    1. add detach() event, however docs need to stress that this can't be called in the case where the Session is garbage collected without close/expunge first

    2. add delete() event

  10. Mike Bayer repo owner
    • The :class:.SessionEvents suite now includes events to allow unambiguous tracking of all object lifecycle state transitions in terms of the :class:.Session itself, e.g. pending, transient, persistent, detached. The state of the object within each event is also defined. fixes #2677
    • Added a new session lifecycle state :term:deleted. This new state represents an object that has been deleted from the :term:persistent state and will move to the :term:detached state once the transaction is committed. This resolves the long-standing issue that objects which were deleted existed in a gray area between persistent and detached. The :attr:.InstanceState.persistent accessor will no longer report on a deleted object as persistent; the :attr:.InstanceState.deleted accessor will instead be True for these objects, until they become detached.
    • The :paramref:.Session.weak_identity_map parameter is deprecated. See the new recipe at :ref:session_referencing_behavior for an event-based approach to maintaining strong identity map behavior. references #3517

    → <<cset 108c60f460c7>>

  11. Log in to comment