1. Michael Bayer
  2. sqlalchemy

Wiki

Clone wiki

sqlalchemy / UsageRecipes / ManyToManyOrphan

ManyToManyOrphan

SQLAlchemy's "delete-orphan" feature will delete objects in a collection when they are removed from their parent, or the parent is deleted. But this simple function requires that the orphan only be associated with a single parent. How do we delete "orphans" only after all their parents are deleted ?

The common case, which has already come up twice on StackOverflow in virtually the identical format(1)(2), is when you have Tag objects that you'd like deleted when all the parents have been. To do this, we use an after_flush event, and perform the operation using an aggregate DELETE command which is not dependent on the flush itself:

from sqlalchemy.ext.associationproxy import association_proxy
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import event

Base= declarative_base()

tagging = Table('tagging',Base.metadata,
    Column('tag_id', Integer, ForeignKey('tag.id', ondelete='cascade'), primary_key=True),
    Column('entry_id', Integer, ForeignKey('entry.id', ondelete='cascade'), primary_key=True)
)

class Tag(Base):

    __tablename__ = 'tag'
    id = Column(Integer, primary_key=True)
    name = Column(String(100), unique=True, nullable=False)

    def __init__(self, name=None):
        self.name = name

class Entry(Base):
    __tablename__ = 'entry'

    id = Column(Integer, primary_key=True)
    tag_names = association_proxy('tags', 'name')

    tags = relationship('Tag',
                        secondary=tagging,
                        backref='entries')

@event.listens_for(Session, 'after_flush')
def delete_tag_orphans(session, ctx):
    session.query(Tag).\
        filter(~Tag.entries.any()).\
        delete(synchronize_session=False)

e = create_engine("sqlite://", echo=True)

Base.metadata.create_all(e)

s = Session(e)

r1 = Entry()
r2 = Entry()
r3 = Entry()
t1, t2, t3, t4 = Tag("t1"), Tag("t2"), Tag("t3"), Tag("t4")

r1.tags.extend([t1, t2])
r2.tags.extend([t2, t3])
r3.tags.extend([t4])
s.add_all([r1, r2, r3])

assert s.query(Tag).count() == 4

r2.tags.remove(t2)

assert s.query(Tag).count() == 4

r1.tags.remove(t2)

assert s.query(Tag).count() == 3

r1.tags.remove(t1)

assert s.query(Tag).count() == 2

Above, the event handler we've built will emit a DELETE on every non-empty flush. The DELETE will find no rows if there are no orphan Tags. However, if we want to try to prevent the DELETE from being emitted in the case that there are no Entry objects deleted or modified in the Session, we can scan the Session for them using this alternative form of the event handler:

@event.listens_for(Session, 'after_flush')
def delete_tag_orphans(session, ctx):
    # optional: look through Session state to see if we want
    # to emit a DELETE for orphan Tags
    flag = False

    for instance in session.dirty:
        if isinstance(instance, Entry) and \
            attributes.get_history(instance, 'tags').deleted:
            flag = True
            break
    for instance in session.deleted:
        if isinstance(instance, Entry):
            flag = True
            break

    # emit a DELETE for all orphan Tags.   This is safe to emit
    # regardless of "flag", if a less verbose approach is
    # desired.
    if flag:
        session.query(Tag).\
            filter(~Tag.entries.any()).\
            delete(synchronize_session=False)

Above, we scan for "dirty" entries and check for items "deleted" from the "tags" collection; just checking for "dirty" entries would work as well for a less specific but simpler check. We also check for entries deleted entirely. We only emit the DELETE if we've seen that either of these conditions has been observed. The tradeoff of the above approach is that we may emit fewer DELETE statements, however if we are flushing large collections of objects, we spend more time scanning through dirty and deleted objects checking for state. Another option would be to intercept Entry objects within the after_update() and after_delete ORM hooks, setting a flag as a result:

ctx = {}
@event.listens_for(Entry, 'after_update', raw=True)
def _upd_entry(mapper, connection, state):
    # 'state.attrs' is available in 0.8
    # in 0.7 use attributes.get_state_history(state, 'tags').deleted
    if state.attrs.tags.history.deleted:
        sess_key = state.session_id
        ctx[(sess_key, 'orphaned_tags')] = True

@event.listens_for(Entry, 'after_delete', raw=True)
def _del_entry(mapper, connection, state):
    sess_key = state.session_id
    ctx[(sess_key, 'orphaned_tags')] = True


@event.listens_for(Session, 'after_flush')
def delete_tag_orphans(session, context):
    if ctx.get((session.hash_key, 'orphaned_tags')):
        session.query(Tag).\
            filter(~Tag.entries.any()).\
            delete(synchronize_session=False)
        del ctx[(session.hash_key, 'orphaned_tags')]

SQLAlchemy-Utils contains a utility function for handling many-to-many orphan deletion easily.(3)

from sqlalchemy_utils import auto_delete_orphans

auto_delete_orphans(Entry.tags)

1: http://stackoverflow.com/questions/12653824/delete-children-after-parent-is-deleted-in-sqlalchemy/12731232

2: http://stackoverflow.com/questions/9234082/setting-delete-orphan-on-sqlalchemy-relationship-causes-assertionerror-this-att/9264556#9264556

3: http://sqlalchemy-utils.readthedocs.org/en/latest/listeners.html#many-to-many-orphan-deletion

Updated