KeyError when marking attribute as modified (add a flag_dirty function to force an object into flush events)

Issue #3753 resolved
Drachenfels created an issue

It's long story and it took me 2 days to pinpoint problem (literally 12-14h of work).

My story is as follows.

I have db with object Article, after Article is updated I do additional shenanigans using events (post_flush mostly), however Article has Slug (many slugs, in fact it's m2m). Whenever Slug is modified I need to trigger flush on Article. But because Slug is modified and not Article, dirty is Slug not Article. So I applied this answer:

http://stackoverflow.com/questions/29830229/force-object-to-be-dirty-in-sqlalchemy

After that Article is dirty, however what happens when we mark attribute dirty but do not modify it and in fact there is no value at that field? KeyError.

(...)

self.session.flush()
File "/core/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2019, in flush
self._flush(objects)
File "/core/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2137, in _flush
transaction.rollback(_capture_exception=True)
File "/core/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
compat.reraise(exc_type, exc_value, exc_tb)
File "/core/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2101, in _flush
flush_context.execute()
File "/core/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 373, in execute
rec.execute(self)
File "/core/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 532, in execute
uow
File "/core/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 170, in save_obj
mapper, table, update)
File "/core/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 634, in _emit_update_statements
lambda rec: (
File "/core/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 454, in _collect_update_commands
value = state_dict[propkey]

Fully working test case with failure is here:

http://sprunge.us/ZeRF

Or here:

https://dl.dropboxusercontent.com/u/196245/sqlalchemy_bug.py

It's the same bug reported I noted first time on sqlalchemy google group:

https://groups.google.com/d/topic/sqlalchemy/A46fChoYO_w/discussion

Please note example is hardcoded with postgres db, not sure if can be reproduced in other db'ies.

Offending code is in this method, where I put extra comment on how to reproduce and how to solve error.

@slug.setter
def slug(self, value):

Comments (21)

  1. Drachenfels reporter

    I will add that in the previous discussion I proposed a patch as well.

    Thank you!

    Also I wanted to say, keep up with good work! SQLAlchemy rocks! :-)

  2. Mike Bayer repo owner

    there's no bug illustrated here, you're calling flag_modified() on an attribute that's not loaded. (well, unless we want that to raise an error).

    test passes if you either do:

                    setattr(self, prop.key, getattr(self, prop.key))
    

    or:

                    if prop.key in self.__dict__:
                        sqlalchemy.orm.attributes.flag_modified(self, prop.key)
    

    but also I don't see the purpose of all this complexity. if you're trying to "set" the slug, you'd want to do the mirror of the "get" for slug, something like this:

    @slug.setter
    def slug(self, value):
        if not self.slugs_connector:
            self.slugs_connector.append(SlugArticle(slug=Slug(slug=value)))
        else:
            self.slugs_connector[0].slug = value
    
  3. Drachenfels reporter

    Maybe my code could be written better, but don't you think that flagging attribute as changed should not throw exception? My original patch was suggesting to just continue loop in SQLAlchemy, no harm done, no exception thrown. Just a thought. I am pretty sure I am not the only one that is doing this mistake. ;-)

  4. Drachenfels reporter

    After second thought, I believe we have misunderstanding.

    I use this complexity not to set a slug, this one is pretty straight forward, I use it to mark article as dirty. To do that I need to set any attribute as modified (in my case I set all of them as such). So unless there is a way to mark object as dirty without marking any particular attribute that may or may not be set or there is a bug. In general I believe that as you said originally KeyError should not be thrown in here.

  5. Mike Bayer repo owner

    it should throw an error if anything. I don't understand what you're trying to do. It makes no sense to mark an attribute as "modified" that doesn't actually have a value. How would the "change" in this attribute be persisted ?

  6. Drachenfels reporter

    During a discussion last time l figured out that my example could be simplified more, and it's not exactly what I thought it is.

    So there is an update:

    https://dl.dropboxusercontent.com/u/196245/sqlalchemy_bug_v2.py

    or:

    http://sprunge.us/JUfE

    We mark on article instance, attribute 'meta_description' as modified. And now if we don't add any relationship it's just fine, object is saved, everyone is happy. But if we do add relationship object (tag in this case). Boom. KeyError.

    As you can see this time there are just two models Article that has many Tags.

    You wondered why to mark attribute as modified without changing it. Answer was in this Stackoverflow question:

    http://stackoverflow.com/questions/29830229/force-object-to-be-dirty-in-sqlalchemy

    As far as I know (maybe I am wrong) you cannot mark object as dirty other way than marking any single attribute as modified.

    Cheers.

  7. Mike Bayer repo owner

    i dont understand the purpose of an object that has no actual changes on it being marked as "dirty". What's important is what are the persistence operations which you'd like to take effect. If you can please show me a test case that illustrates normal ORM use here where the thing that you need to happen is not happening (e.g. don't try to manually set dirty), the use case will be clear. I am certain that there is a better way to do what you are trying to accomplish.

  8. Drachenfels reporter

    My post says that when you run:

    article = Article(title='Second article')
    
    sqlalchemy.orm.attributes.flag_modified(article, 'meta_description')
    
    Session.add(article)
    
    Session.commit()
    

    No error.

    But when you do:

    article = Article(title='Second article')
    
    for tag in tags:
         tag = Session.query(Tag).filter(Tag.uid == tag.uid).one()
    
         obj = ArticleTag(tag=tag)
    
         article.article_tags.append(obj)
    
    sqlalchemy.orm.attributes.flag_modified(article, 'meta_description')
    
    Session.add(article)
    
    Session.commit()
    

    KeyError happens.

    Is it consistent? In my opinion it is not.

    Is it a bug? I don't know.

    Is it fixable? Yes it is.

    If you wondering my use case why I need to mark object as dirty in the first place explanation below (however it's at the same time irrelevant).

    I have post_flush event that perform additional computation on database whenever Article is modified. In my case whenever m2m is changed and it's related to article. If it happens I want to mark Article as dirty so that my post_flush could detect it and do other stuff.

    Another use case is stackoverflow question.

    In any case I would say absolute minimum is to mention in docs that when you run flag_modified unless specific conditions are met, there will be KeyError.

  9. Mike Bayer repo owner

    If you wondering my use case why I need to mark object as dirty in the first place explanation below (however it's at the same time irrelevant).

    I'd like to make flag_modified() raise an InvalidRequestError immediately if the target attribute has no value, because the operation makes no sense and the user must be doing something wrong. Does that work for you ?

    In my case whenever m2m is changed and it's related to article. If it happens I want to mark Article as dirty so that my post_flush could detect it and do other stuff

    if an m2m is changed on a collection associated with the Article collection, it is already marked as dirty. Typically, business-related changes between objects (change object A means do a thing to object B) occur within events outside of the flush, that way things are very easy and you don't pollute the flush process where it is not as easy to manipulate things. If you have a pure database change or otherwise related to some associated collection that happens in post_flush(), you can find your Article based on what was actually changed in the post_flush.

    I appreciate that it's difficult and frustrating to explain your actual use case here however when I make changes to the project, they are to benefit the library and the project overall and the end-user that helps me to define the rationale for the change is contributing back to the project.

  10. Mike Bayer repo owner

    Also keep in mind, if you truly need an object with zero changes "dirty", wouldn't you rather have an API method to do that directly? that would make more sense and is very simple to implement.

  11. Mike Bayer repo owner
    diff --git a/lib/sqlalchemy/orm/attributes.py b/lib/sqlalchemy/orm/attributes.py
    index e01c135..a62a870 100644
    --- a/lib/sqlalchemy/orm/attributes.py
    +++ b/lib/sqlalchemy/orm/attributes.py
    @@ -1615,3 +1615,16 @@ def flag_modified(instance, key):
         state, dict_ = instance_state(instance), instance_dict(instance)
         impl = state.manager[key].impl
         state._modified_event(dict_, impl, NO_VALUE, force=True)
    +
    +
    +def flag_dirty(instance):
    +    """Mark an instance as 'dirty' without any specific attribute mentioned.
    +
    +    This is a special operation that will allow the object to travel through
    +    the flush process for interception by events; however, no SQL will be
    +    emitted as long as there are no actual changes on the object.
    +
    +    """
    +
    +    state, dict_ = instance_state(instance), instance_dict(instance)
    +    state._modified_event(dict_, None, NO_VALUE, force=True)
    diff --git a/lib/sqlalchemy/orm/state.py b/lib/sqlalchemy/orm/state.py
    index 2704367..4914544 100644
    --- a/lib/sqlalchemy/orm/state.py
    +++ b/lib/sqlalchemy/orm/state.py
    @@ -627,18 +627,19 @@ class InstanceState(interfaces.InspectionAttr):
    
         def _modified_event(
                 self, dict_, attr, previous, collection=False, force=False):
    -        if not attr.send_modified_events:
    -            return
    -        if attr.key not in self.committed_state or force:
    -            if collection:
    -                if previous is NEVER_SET:
    -                    if attr.key in dict_:
    -                        previous = dict_[attr.key]
    +        if attr:
    +            if not attr.send_modified_events:
    +                return
    +            if attr.key not in self.committed_state or force:
    +                if collection:
    +                    if previous is NEVER_SET:
    +                        if attr.key in dict_:
    +                            previous = dict_[attr.key]
    
    -                if previous not in (None, NO_VALUE, NEVER_SET):
    -                    previous = attr.copy(previous)
    +                    if previous not in (None, NO_VALUE, NEVER_SET):
    +                        previous = attr.copy(previous)
    
    -            self.committed_state[attr.key] = previous
    +                self.committed_state[attr.key] = previous
    
             # assert self._strong_obj is None or self.modified
    
    @@ -656,7 +657,7 @@ class InstanceState(interfaces.InspectionAttr):
                 if self.session_id:
                     self._strong_obj = inst
    
    -            if inst is None:
    +            if inst is None and attr:
                     raise orm_exc.ObjectDereferencedError(
                         "Can't emit change event for attribute '%s' - "
                         "parent object of type %s has been garbage "
    
  12. Drachenfels reporter

    Hi there,

    I'd like to make flag_modified() raise an InvalidRequestError immediately if the target attribute has no value, because the operation makes no sense and the user must be doing something wrong. Does that work for you ?

    Absolutely, the current system was/is problematic exactly because at times it was causing exception and other times working well. What is more KeyError was avoided if the operation was wrapped in no_autoflush however I have no idea why. That being said, it might be backward incompatible for some users.

    Also adding flag_dirty is a great thing. For all those special cases there is a clear way to do it.

    Also keep in mind, if you truly need an object with zero changes "dirty", wouldn't you rather have an API method to do that directly?

    API was better solution I agree, however I used chain of events that was more or less like that. Many models can have Slugs, slug field was a mixin. When object that have slug is created or updated, check title and update slug field if necessary it was triggered by event. Then mark instance that have slug as dirty, so another event could intercept it and update updated_at field. I am not exactly sure if that was the reason but I am pretty sure it was something among those lines. To be frank I have written this code 1 year 4 months ago and since then I a lot has changed. That is why I said my use case is not relevant, I just detected anomaly with exception being thrown at quite arbitrary occasions.

  13. Mike Bayer repo owner

    Oh, then I totally misunderstood. I thought you were asking, "please make this work" :).

  14. Mike Bayer repo owner
    • changed milestone to 1.2

    I'm assuming you've worked around this for now. new features can only go into 1.1 or later and 1.1 is due for release, so pushing this out to 1.2.

  15. Drachenfels reporter

    This 'feature' (bug?) I have observed almost a year ago, quickly enough I found that adding no_flush mitigates problem. However this was workaround and not a good solution, thus I have created this ticket to solve it once and for all.

    I am happy that future generation of developers will get clear error message in every scenario while not necessarily it will influence my everyday work. ;-)

    Thank you!

  16. Mike Bayer repo owner

    succinct test:

    from sqlalchemy import *
    from sqlalchemy.orm import *
    from sqlalchemy.ext.declarative import declarative_base
    
    Base = declarative_base()
    
    
    class A(Base):
        __tablename__ = 'a'
        id = Column(Integer, primary_key=True)
        data = Column(String)
    
    e = create_engine("sqlite://", echo=True)
    Base.metadata.create_all(e)
    
    s = Session(e)
    
    a1 = A(data='adf')
    s.add(a1)
    
    s.commit()
    
    from sqlalchemy.orm import attributes
    
    attributes.flag_modified(a1, 'data')
    s.flush()
    
  17. Mike Bayer repo owner

    Raise on flag_modified() for non-present attribute

    The :func:.attributes.flag_modified function now raises :class:.InvalidRequestError if the named attribute key is not present within the object, as this is assumed to be present in the flush process. To mark an object "dirty" for a flush without referring to any specific attribute, the :func:.attributes.flag_dirty function may be used.

    Change-Id: I6c64e4d253c239e38632f38c27bb16e68fe8dfbe Fixes: #3753

    → <<cset 711d29f8e4dc>>

  18. Log in to comment