the DELETE before INSERT problem

Issue #2501 new
Mike Bayer repo owner created an issue

this would be another semi-rewrite of the UOW. Basically we'd need to remove the hard dependency on "saves" before "deletes". We might even need to break "saves" into INSERT and UPDATE separately, and maybe that's a good idea anyway. We'd then need the UOW to do a better job of treating mapper-level and relationship-level nodes equally - while the "DependencyProcessor" acts as an "edge" between "SaveUpdate"/"Delete", if I screw around with things even trivially, a "DependencyProcessor" can easily find itself not between two nodes, then it gets dropped. So really DependencyProcessor should be treated somehow as a node itself, in addition to being something of an "edge".

The actual DELETE before INSERT would need to be expressed as a dependency between objects that are known to share either a primary key value or a unique constraint value, and the ORM would need some way of being told that objects should be unique along certain attribute combinations (which would need to be multiple).

Comments (27)

  1. Mike Bayer reporter

    Here's why the delete before insert problem is almost not solvable:

    p1 = Parent(id=1)
    c1 = Child()
    p1.children.append(c1)
    s.add(p1)
    s.commit()
    
    p2 = Parent(id=1)
    p2.children.append(c1)
    s.delete(p1)
    s.commit()
    

    what SQL do we emit for the second part ? can't INSERT the parent, need to DELETE the old one first. Can't DELETE the old one first, the Child still refers to it and first needs an UPDATE to the new row, which doesn't exist. Only if we could temporarily update Child to have parent_id=NULL can this work, which means two UPDATEs for the same row, like a post_update type of thing.

    There's all these combinations that keep looking like we're just doing two separate flushes - one for the delete, including all dependencies there as though there were no replacement row available, then another flush for the INSERT.

  2. Mike Bayer reporter

    a proof of concept is attached illustrating how to get #2765's test to work. Encouragingly, we can get an external rule to do the whole thing (in this simple case at least) without any code changes.

  3. Mike Bayer reporter

    fine, a patch is attached. it would be safe to release since it takes no effect unless a new mapper setting "maintain_unique_attributes" is used. As far as what breaks when one uses this attribute, hard to say, though i think mostly the breakage would be when the mapper has lots of other things dependent on it.

    Usage so far requires that one can detect a "dupe" before the UOW assigns foreign keys. So in the example like we got in #2765, it has to check for uniques on the relationship-bound attribute where the change has occurred. but you can also give it several pairs of things to check.

    from sqlalchemy import create_engine
    from sqlalchemy import Column, Integer, String, Float, UniqueConstraint, ForeignKey
    from sqlalchemy.ext.declarative import declarative_base
    from sqlalchemy.orm import sessionmaker, relationship, backref
    from sqlalchemy.orm.dependency import OneToManyDP, attributes
    
    Base = declarative_base()
    
    class ParentEntity(Base):
        __tablename__ = 'parent'
    
        id = Column(Integer, primary_key=True)
        name = Column(String)
    
    class ChildEntity(Base):
        __tablename__ = 'child'
    
        __table_args__ = (UniqueConstraint('parent_id', 'z', name='child_uk'),)
    
        __mapper_args__ = {
            "maintain_unique_attributes": ["z")](("parent",)
    
            # to detect conflicts for both relationship as well as on the integer
            # attribute
            # "maintain_unique_attributes": ["z"), ("parent_id", "z")](("parent",)
        }
    
        id = Column(Integer, primary_key=True)
        parent_id = Column(Integer, ForeignKey('parent.id'), nullable=False)
        z = Column(Float)
    
        parent = relationship("ParentEntity",
                              backref=backref("children",
                                              cascade="all, delete-orphan"))
    
    engine = create_engine('sqlite://', echo=True)
    Session = sessionmaker(bind=engine)
    
    sess = Session()
    Base.metadata.create_all(engine)
    
    p = ParentEntity(name='Datum')
    p.children = [ChildEntity(z=3.5), ChildEntity(z=5.5)](ChildEntity(z=2.5),)
    sess.add(p)
    sess.commit()
    
    # Remove all existing children and create new, one of which is
    # identical to a previously existing child
    p.children = [ChildEntity(z=5.5)](ChildEntity(z=4.0),)
    sess.commit()
    
    c5 = sess.query(ChildEntity).filter_by(z=5.5).one()
    sess.delete(c5)
    sess.add_all([   ChildEntity(z=5.5, parent=p)
    ](
    ))
    sess.commit()
    
  4. Mike Bayer reporter

    also test an update (new patch coming):

    # test with an update
    c4 = sess.query(ChildEntity).filter_by(z=4.0).one()
    c5 = sess.query(ChildEntity).filter_by(z=5.5).one()
    
    c4.z = 5.5
    sess.delete(c5)
    sess.commit()
    
  5. Mike Bayer reporter

    that example earlier with the Parent.children, yeah that kind of thing still might not work. but I don't think that's what folks are going for with this one in most cases.

  6. Mike Bayer reporter

    this is very ready to go if it has tests, but 0.9 is way behind sched at this point so pushing this to 0.9.xx since it has minimal impact on existing code

  7. Marc Schlaich

    I'm running in the same issue than #2765 but I have more complicated edge cases.

    My main functionality is to update an object from a dict, so e.g. d = {'children': [dict(id=1, z=2)](dict(z=1),) will create a new child for the first entry and will use the existing Child with id 1 from the database.

    There are a lot of cases to be considered: replace an existing with a new one, the other way round, swap two existing children and probably more.

    This gets interesting as soon as you have an ordered relationship with ordering_list and a delete-orphan and a unique constraint on (position, parent_id) in the child.

    Right now I have "solved" all issues: First by triggering cascade delete for all existing children and then "reviving" the children which are still needed by make_transient. Next, I fix possible ordering constraint issues by only adding previously existing objects to the relationship (because if child id=1 with position=2 should now be on position=1 and a new child should be on 2, you try to insert child=2 with position=2 first). At last, I set all objects to the relationship.

    Here is my code. get_related_obj is responsible for creating or querying the child object.

    old_ids = set([for o in getattr(obj, field)](o.id))
    value = [obj, v, field) for v in value](get_related_obj(session,)
    setattr(obj, field, list())
    session.flush()  # possibly trigger delete cascade
    for v in value:
        state = inspection.inspect(v)
        if state.deleted:
            orm.make_transient(v)  # put it back to life
    new_filtered = [for o in value if o.id and o.id in old_ids](o)
    setattr(obj, field, new_filtered)
    session.flush()  # trigger update of position
    # now set the correct list
    setattr(obj, field, value)
    

    I'm not sure if this is covering all edge cases, maybe you found something I missed?

    However, my main purpose of this comment is to give you some cases you should test for this patch. I'm pretty sure that not all of them are covered :)

  8. Marc Schlaich

    Oh no. The solution from above does not work predictably in the following case:

    • Replace the first existing child with a new one

    Sometimes the test passes, sometimes not. Error is:

    IntegrityError: (IntegrityError) columns parent_id, position are not unique u'INSERT INTO ...
    

    I assume that there is not determined order to update pending instances? Probably they are in a dict?

    Any idea how I can solve this issue?

  9. Marc Schlaich

    Looks like resetting all order_by properties of revived objects does the trick:

    def get_order_by_properties(model, relation_name):
        relation = getattr(model, relation_name)
        if relation.property.order_by:
            return [for c in relation.property.order_by](c.name)
        return list()
    
    
    value = [obj, v, field) for v in value](get_related_obj(session,)
    setattr(obj, field, list())
    session.flush()  # possibly trigger delete cascade
    for v in value:
        state = inspection.inspect(v)
        if state.deleted:
            orm.make_transient(v)  # put it back to life
    
            # reset order by fields
            for attr in get_order_by_properties(type(obj), field):
                if hasattr(v, attr):
                    setattr(v, attr, None)
    setattr(obj, field, value)
    
  10. Marc Schlaich

    Another update: there are some issues with association proxy handling and doing flush when the object is not yet fully created (similar to #2809). So here is a fixed version for the code above:

    def fix_order_by_constraint(session, obj, field, value):
        setattr(obj, field, list())
        session.flush()  # possibly trigger delete cascade
        for v in value:
            state = inspection.inspect(v)
            if state.deleted:
                orm.make_transient(v)  # put it back to life
    
                # reset order by fields
                for attr in get_order_by_properties(type(obj), field):
                    if hasattr(v, attr):
                        setattr(v, attr, None)
    
    
    value = [obj, v, field) for v in value](get_related_obj(session,)
    if obj.id:
        fix_order_by_constraint(session, obj, field, value)
    setattr(obj, field, value)
    
  11. Mike Bayer reporter

    hi can you please attach an actual code example of what it is you're trying to do ? thanks. note that this ticket involves specifically models with unique constraints where rows are being replaced.

  12. Marc Schlaich

    Just forget it :) I had unique constraints on my ordering column which is totally non-sense and which I should have realized even before reading the warning in the documentation...

  13. Mike Bayer reporter

    here's a test that illustrates the feature working against the primary key, a change to the patch coming will handle this:

    from sqlalchemy import create_engine
    from sqlalchemy import Column, Integer, String, Float, UniqueConstraint, ForeignKey
    from sqlalchemy.ext.declarative import declarative_base
    from sqlalchemy.orm import Session, relationship, backref
    from sqlalchemy.orm.dependency import OneToManyDP, attributes
    
    Base = declarative_base()
    
    class ParentEntity(Base):
        __tablename__ = 'parent'
    
        id = Column(Integer, primary_key=True)
        name = Column(String)
    
        __mapper_args__ = {
            "maintain_unique_attributes": [("id",)](("id",))
        }
    
    engine = create_engine('sqlite://', echo=True)
    
    Base.metadata.create_all(engine)
    
    sess = Session(engine)
    
    p1 = ParentEntity(id=1, name='p1')
    sess.add(p1)
    sess.commit()
    
    p2 = ParentEntity(id=1, name='p2')
    sess.delete(p1)
    sess.add(p2)
    sess.commit()
    
  14. Mike Bayer reporter
    • marked as major
    • changed milestone to 1.1

    this so far isn't hitting the same level of urgency as a lot of other things in 1.0 so moving it out for the moment.

  15. Jimmy Jia

    I'm hitting this as well, and the linked proof of concept mostly works for me.

    However, one case where it doesn't work is when the underlying operation hits the "row switch" case, because the ORM attempts to emit an UPDATE instead of an INSERT for the instance.

    I'm attempting to work around this right now by discarding deleted instances from the instance map, which causes the row switch check to return false.

    Actually, thinking about this a bit more, that seems like a bad idea; I think I'll just stop using the unique constraint for now and enforce my unique constraint outside of the database anyway.

  16. Mike Bayer reporter
    • changed milestone to 1.4
    • edited description

    I don't get any call for this issue so keeping this low priority, but still milestoned for now.

  17. Log in to comment