Strange effect with association proxy and event listener

Issue #2809 resolved
Marc Schlaich created an issue

I'm getting this strange traceback in a proprietary codebase. job.programs is an association proxy to job.job_programs. set_max_stat_program_added is an event handler listening on job.job_programs.append.

Unfortunately I'm not able to reproduce this issue with a simple test case. But it looks to me that the flush in the event callback is emitting a wrong SQL statement related to the association proxy. I can further investigate and debug this issue if someone gives me instructions but I'm not allowed to publish the code.

A workaround for this issue is to preload job.pallets with joinedload in the query.

Traceback (most recent call last):
  File "tests\test_resources.py", line 251, in test_check_enable
    job_res.update(job_id, {'programs': [prog_id](prog_id), 'enabled': True})
  File "mes\resource.py", line 185, in update
    job.programs.append(prog)
  File "packages\sqlalchemy\ext\associationproxy.py", line 568, in append
    self.col.append(item)
  File "packages\sqlalchemy\orm\collections.py", line 1057, in append
    item = __set(self, item, _sa_initiator)
  File "packages\sqlalchemy\orm\collections.py", line 1029, in __set
    item = getattr(executor, 'fire_append_event')(item, _sa_initiator)
  File "packages\sqlalchemy\orm\collections.py", line 733, in fire_append_event
    item, initiator)
  File "packages\sqlalchemy\orm\attributes.py", line 910, in fire_append_event
    value = fn(state, value, initiator or self)
  File "packages\sqlalchemy\orm\events.py", line 1496, in wrap
    orig_fn(target, value, *arg)
  File "mes\model.py", line 81, in set_max_stat_program_added
    for pallet in job.pallets:
  File "packages\sqlalchemy\orm\attributes.py", line 316, in __get__
    return self.impl.get(instance_state(instance), dict_)
  File "packages\sqlalchemy\orm\attributes.py", line 613, in get
    value = self.callable_(state, passive)
  File "packages\sqlalchemy\orm\strategies.py", line 524, in _load_for_state
    return self._emit_lazyload(session, state, ident_key, passive)
  File "packages\sqlalchemy\orm\strategies.py", line 585, in _emit_lazyload
    result = q.all()
  File "packages\sqlalchemy\orm\query.py", line 2237, in all
    return list(self)
  File "packages\sqlalchemy\orm\query.py", line 2348, in __iter__
    self.session._autoflush()
  File "packages\sqlalchemy\orm\session.py", line 1139, in _autoflush
    self.flush()
  File "packages\sqlalchemy\orm\session.py", line 1818, in flush
    self._flush(objects)
  File "packages\sqlalchemy\orm\session.py", line 1936, in _flush
    transaction.rollback(_capture_exception=True)
  File "packages\sqlalchemy\util\langhelpers.py", line 58, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "packages\sqlalchemy\orm\session.py", line 1900, in _flush
    flush_context.execute()
  File "packages\sqlalchemy\orm\unitofwork.py", line 372, in execute
    rec.execute(self)
  File "packages\sqlalchemy\orm\unitofwork.py", line 525, in execute
    uow
  File "packages\sqlalchemy\orm\persistence.py", line 64, in save_obj
    table, insert)
  File "packages\sqlalchemy\orm\persistence.py", line 569, in _emit_insert_statements
    execute(statement, params)
  File "packages\sqlalchemy\engine\base.py", line 662, in execute
    params)
  File "packages\sqlalchemy\engine\base.py", line 761, in _execute_clauseelement
    compiled_sql, distilled_params
  File "packages\sqlalchemy\engine\base.py", line 874, in _execute_context
    context)
  File "packages\sqlalchemy\engine\base.py", line 1024, in _handle_dbapi_exception
    exc_info
  File "packages\sqlalchemy\util\compat.py", line 195, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "packages\sqlalchemy\engine\base.py", line 867, in _execute_context
    context)
  File "packages\sqlalchemy\engine\default.py", line 324, in do_execute
    cursor.execute(statement, parameters)
IntegrityError: (IntegrityError) job_programs.job_id may not be NULL u'INSERT INTO job_programs (program_id) VALUES (?)'
 (1,)

Comments (12)

  1. Marc Schlaich reporter

    However, the most strange thing for me is that there is an SQL execution at all at the point of proxy.append. Shouldn't there be SQL execution only on commit?

  2. Mike Bayer repo owner

    the traceback has "autoflush" inside of it, which occurs each time the Session needs to go to the database to get some data. In this traceback, as append() is called, the collection to which it is appending is unloaded, so has to be loaded first, triggering autoflush.

    Issues where autoflush gets in the way in some object mutation scenarios are common. Use the no_autoflush context manager to temporarily turn it off.

    also since there's no specific "bug" defined here (no test case) this is more of a mailing list issue for the moment, can we please move it over there? thanks.

  3. Marc Schlaich reporter
    • changed status to open
    • removed status

    Finally I was able to find a common test case for this issue. It looks like that the combination of association proxy + event listening on the associated attribute has the effect that you have an inconsistent state while in the event handler which causes any flush to fail.

    from sqlalchemy import (create_engine, event, inspect, Column, Integer,
                            ForeignKey, String)
    from sqlalchemy.orm import relationship, sessionmaker
    
    from sqlalchemy.ext.associationproxy import association_proxy
    from sqlalchemy.ext.declarative import declarative_base
    
    
    Base = declarative_base()
    
    
    class User(Base):
        __tablename__ = 'user'
        id = Column(Integer, primary_key=True)
        name = Column(String(64))
        keywords = association_proxy('user_keywords', 'keyword')
    
        user_keywords = relationship('UserKeyword', cascade='all, delete-orphan',
                                     backref='user')
    
    
    @event.listens_for(User.user_keywords, 'append')
    def keywords_added(user, value, initiator):
        session = inspect(user).session
        if session:
            session.flush()
    
    
    class UserKeyword(Base):
        __tablename__ = 'user_keyword'
        user_id = Column(Integer, ForeignKey('user.id'), primary_key=True)
        keyword_id = Column(Integer, ForeignKey('keyword.id'), primary_key=True)
    
        def __init__(self, keyword=None, **kwargs):
            super(UserKeyword, self).__init__(keyword=keyword, **kwargs)
    
    
    class Keyword(Base):
        __tablename__ = 'keyword'
        id = Column(Integer, primary_key=True)
        keyword = Column(String(64))
    
        user_keywords = relationship(
            'UserKeyword', cascade='all, delete-orphan',
            backref='keyword')
    
    
    engine = create_engine('sqlite:///:memory:')
    Base.metadata.create_all(engine)
    Session = sessionmaker(bind=engine)
    session = Session()
    
    user = User(name='xx')
    user.keywords.append(Keyword(keyword='x'))
    session.add(user)
    session.commit()
    
    session = Session()
    
    user = session.query(User).first()
    for kw in session.query(Keyword):
        user.keywords.append(kw)
    

    Result:

    $ python sqla_2809.py
    Traceback (most recent call last):
      File "sqla_2809.py", line 63, in <module>
        user.keywords.append(kw)
      File "d:\Projekte\HACS\hacs\packages\sqlalchemy\ext\associationproxy.py", line 568, in append
        self.col.append(item)
      File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\collections.py", line 1057, in append
        item = __set(self, item, _sa_initiator)
      File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\collections.py", line 1029, in __set
        item = getattr(executor, 'fire_append_event')(item, _sa_initiator)
      File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\collections.py", line 733, in fire_append_event
        item, initiator)
      File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\attributes.py", line 910, in fire_append_event
        value = fn(state, value, initiator or self)
      File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\events.py", line 1496, in wrap
        orig_fn(target, value, *arg)
      File "sqla_2809.py", line 27, in keywords_added
        session.flush()
      File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\session.py", line 1818, in flush
        self._flush(objects)
      File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\session.py", line 1936, in _flush
        transaction.rollback(_capture_exception=True)
      File "d:\Projekte\HACS\hacs\packages\sqlalchemy\util\langhelpers.py", line 58, in __exit__
        compat.reraise(exc_type, exc_value, exc_tb)
      File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\session.py", line 1900, in _flush
        flush_context.execute()
      File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\unitofwork.py", line 372, in execute
        rec.execute(self)
      File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\unitofwork.py", line 525, in execute
        uow
      File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\persistence.py", line 64, in save_obj
        table, insert)
      File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\persistence.py", line 569, in _emit_insert_statements
        execute(statement, params)
      File "d:\Projekte\HACS\hacs\packages\sqlalchemy\engine\base.py", line 662, in execute
        params)
      File "d:\Projekte\HACS\hacs\packages\sqlalchemy\engine\base.py", line 761, in _execute_clauseelement
        compiled_sql, distilled_params
      File "d:\Projekte\HACS\hacs\packages\sqlalchemy\engine\base.py", line 874, in _execute_context
        context)
      File "d:\Projekte\HACS\hacs\packages\sqlalchemy\engine\base.py", line 1024, in _handle_dbapi_exception
        exc_info
      File "d:\Projekte\HACS\hacs\packages\sqlalchemy\util\compat.py", line 195, in raise_from_cause
        reraise(type(exception), exception, tb=exc_tb)
      File "d:\Projekte\HACS\hacs\packages\sqlalchemy\engine\base.py", line 867, in _execute_context
        context)
      File "d:\Projekte\HACS\hacs\packages\sqlalchemy\engine\default.py", line 324, in do_execute
        cursor.execute(statement, parameters)
    sqlalchemy.exc.IntegrityError: (IntegrityError) user_keyword.user_id may not be NULL u'INSERT INTO user_keyword (keyword_id) VALUES (?)' (1,)
    
  4. Marc Schlaich reporter

    BTW, the testcase works if Keyword doesn't have a reference to UserKeyword. This was the key issue to find a reproducible testcase.

  5. Mike Bayer repo owner

    OK, please, please, can we go to the mailing list. this is not a bug. You are calling flush() in the middle of an event which is occurring before your object is ready to be flushed. if you remove the event handler you're adding, there's no stack trace - you've essentially illustrated with flush() what is already happening in your real app due to autoflush (note the original stack trace has _autoflush() vs. this one which has flush()), and I've already explained, if you have a block that can't be autoflushed, please use the "no_autoflush" helper, that's what it's for.

    Of course if you tell the session to flush() at the point at which your objects are not completely associated with each other such that you'll get integrity violations due to null constraints, you'll get an integrity violation. SQLAlchemy can't magically undo what you're telling it to do in order to avoid an exception.

    Lets talk about where/how you can temporarily turn off autoflush on the mailing list, it's here: http://groups.google.com/group/sqlalchemy/ (though I'm off the grid for the next couple of days).

  6. Marc Schlaich reporter

    apologies, somehow my response yesterday wound up as an edit here, not sure what happened there, this is the restored text

    I'm still not convinced, sorry.

    I know that this is a constructed example, but any query to the database in the event handler will cause the auto flush and therefore cause this exception (which makes it not such a constructed case). And I know that I can workaround this issue by wrapping the event handler code into an with session.no_autoflush.

    But I see this just as a workaround and not as the solution. I do not agree that this should be up to the user. IMO sqlalchemy should detect that it is in an inconsistent state where a flush would (or could) fail and automatically disable it until you are back in a consistent state.

  7. Mike Bayer repo owner

    Replying to schlamar:

    You should even consider if autoflush=False as default doesn't fit better to your "no magic" policy.

    that's how it was for years. autoflush being on has solved a lot of problems. I'd ask that you consider this project has been going for eight years and has evolved tremendously. At this point, not a lot is by accident anymore.

    The behavior above is really unexpected (and I couldn't find any clue about such an issue in documentation) and I found similar issues regarding autoflush across the net, e.g. http://lists.okfn.org/pipermail/ckan-dev/2011-January/000189.html

    there were many issues with autoflush being turned off too.

  8. Mike Bayer repo owner

    a quick poll on autoflush:

    Hibernate (main inspiration for SQLAlchemy) - autoflush by default: http://docs.jboss.org/hibernate/orm/4.2/manual/en-US/html/ch11.html#objectstate-flushing

    Storm ORM (at one time a key SQLAlchemy inspiration / competitor) - autoflush by default: https://storm.canonical.com/Tutorial#Flushing

    Doctrine ORM (the creator of Doctrine is now a Python programmer and uses SQLAlchemy), but OK, they don't autoflush (ever? though I seem to see it in their source): https://doctrine-orm.readthedocs.org/en/latest/reference/working-with-objects.html#effects-of-database-and-unitofwork-being-out-of-sync

    nhibernate (.NET version of hibernate) - autoflush by default: http://nhforge.org/doc/nh/en/index.html#manipulatingdata-flushing

    now, we're always improving how and when we call autoflush, but for the moment we autoflush when you access an unloaded collection. because that collection's state can be composed of pending data.

    for years, I was completely uncomfortable with the idea of autoflush, and even though i emulated hibernate, I thought the feature was nuts. but after dealing with the product directly and via user experience, we switched it to true - but it is always an option that can be modified. I find it most effective to be left on most of the time, and disabled just in those cases where it gets in the way. these cases are easy to identify by looking for "autoflush" in the stack trace. it makes lots of neat patterns possible, originally the "dynamic relationship loader" was the main one, where otherwise you'd append an item to the list, then access the list but the item you just appended wouldn't be there if not for autoflush. But this kind of pattern turned out to be a lot more prevalent after that point.

  9. Mike Bayer repo owner

    Replying to schlamar:

    I know that this is a constructed example, but any query to the database in the event handler will cause the auto flush and therefore cause this exception (which makes it not such a constructed case). And I know that I can workaround this issue by wrapping the event handler code into an with session.no_autoflush.

    OK well if you use most any other ORM, there either is no flush, you have to save() anytime you want INSERT or UPDATE yourself, or there are no attribute events. Use any of those ORMs and this problem doesn't exist - but neither does the feature. There is no way this particular example could function, because attribute events fire off before the actual change occurs. Only if you manually set the foreign key value to that of the user.id. do that if you'd like.

    But I see this just as a workaround and not as the solution. I do not agree that this should be up to the user. IMO sqlalchemy should detect that it is in an inconsistent state where a flush would (or could) fail and automatically disable it until you are back in a consistent state

    SQLAlchemy would be making a significant guess here that when it goes to autoflush on behalf of your job.pallets collection that's local to your event handler, that it needs to be concerned about the state of what is at this point an entirely unrelated collection of job.programs that's external to the event handler. If a user expects autoflush to occur, it would be pretty surprising and inconsistent if it turned itself off in certain hard-to-spot situations. It's a lot more simple to understand that autoflush occurred, an integrity error happened, so therefore this particular event handler needs no_autoflush on it. IMHO having a consistent and predictable autoflush is a lot more straightforward than an autoflush that makes guesses in obscure situations as to when it might not want to proceed, as it wouldn't be consistently effective.

  10. Log in to comment