Strange effect with association proxy and event listener
I'm getting this strange traceback in a proprietary codebase. job.programs
is an association proxy to job.job_programs
. set_max_stat_program_added
is an event handler listening on job.job_programs.append
.
Unfortunately I'm not able to reproduce this issue with a simple test case. But it looks to me that the flush in the event callback is emitting a wrong SQL statement related to the association proxy. I can further investigate and debug this issue if someone gives me instructions but I'm not allowed to publish the code.
A workaround for this issue is to preload job.pallets
with joinedload
in the query.
Traceback (most recent call last):
File "tests\test_resources.py", line 251, in test_check_enable
job_res.update(job_id, {'programs': [prog_id](prog_id), 'enabled': True})
File "mes\resource.py", line 185, in update
job.programs.append(prog)
File "packages\sqlalchemy\ext\associationproxy.py", line 568, in append
self.col.append(item)
File "packages\sqlalchemy\orm\collections.py", line 1057, in append
item = __set(self, item, _sa_initiator)
File "packages\sqlalchemy\orm\collections.py", line 1029, in __set
item = getattr(executor, 'fire_append_event')(item, _sa_initiator)
File "packages\sqlalchemy\orm\collections.py", line 733, in fire_append_event
item, initiator)
File "packages\sqlalchemy\orm\attributes.py", line 910, in fire_append_event
value = fn(state, value, initiator or self)
File "packages\sqlalchemy\orm\events.py", line 1496, in wrap
orig_fn(target, value, *arg)
File "mes\model.py", line 81, in set_max_stat_program_added
for pallet in job.pallets:
File "packages\sqlalchemy\orm\attributes.py", line 316, in __get__
return self.impl.get(instance_state(instance), dict_)
File "packages\sqlalchemy\orm\attributes.py", line 613, in get
value = self.callable_(state, passive)
File "packages\sqlalchemy\orm\strategies.py", line 524, in _load_for_state
return self._emit_lazyload(session, state, ident_key, passive)
File "packages\sqlalchemy\orm\strategies.py", line 585, in _emit_lazyload
result = q.all()
File "packages\sqlalchemy\orm\query.py", line 2237, in all
return list(self)
File "packages\sqlalchemy\orm\query.py", line 2348, in __iter__
self.session._autoflush()
File "packages\sqlalchemy\orm\session.py", line 1139, in _autoflush
self.flush()
File "packages\sqlalchemy\orm\session.py", line 1818, in flush
self._flush(objects)
File "packages\sqlalchemy\orm\session.py", line 1936, in _flush
transaction.rollback(_capture_exception=True)
File "packages\sqlalchemy\util\langhelpers.py", line 58, in __exit__
compat.reraise(exc_type, exc_value, exc_tb)
File "packages\sqlalchemy\orm\session.py", line 1900, in _flush
flush_context.execute()
File "packages\sqlalchemy\orm\unitofwork.py", line 372, in execute
rec.execute(self)
File "packages\sqlalchemy\orm\unitofwork.py", line 525, in execute
uow
File "packages\sqlalchemy\orm\persistence.py", line 64, in save_obj
table, insert)
File "packages\sqlalchemy\orm\persistence.py", line 569, in _emit_insert_statements
execute(statement, params)
File "packages\sqlalchemy\engine\base.py", line 662, in execute
params)
File "packages\sqlalchemy\engine\base.py", line 761, in _execute_clauseelement
compiled_sql, distilled_params
File "packages\sqlalchemy\engine\base.py", line 874, in _execute_context
context)
File "packages\sqlalchemy\engine\base.py", line 1024, in _handle_dbapi_exception
exc_info
File "packages\sqlalchemy\util\compat.py", line 195, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb)
File "packages\sqlalchemy\engine\base.py", line 867, in _execute_context
context)
File "packages\sqlalchemy\engine\default.py", line 324, in do_execute
cursor.execute(statement, parameters)
IntegrityError: (IntegrityError) job_programs.job_id may not be NULL u'INSERT INTO job_programs (program_id) VALUES (?)'
(1,)
Comments (12)
-
reporter -
reporter - changed watchers to marc.schlaich@gmail.com
-
repo owner - changed status to wontfix
the traceback has "autoflush" inside of it, which occurs each time the Session needs to go to the database to get some data. In this traceback, as append() is called, the collection to which it is appending is unloaded, so has to be loaded first, triggering autoflush.
Issues where autoflush gets in the way in some object mutation scenarios are common. Use the no_autoflush context manager to temporarily turn it off.
also since there's no specific "bug" defined here (no test case) this is more of a mailing list issue for the moment, can we please move it over there? thanks.
-
reporter - changed status to open
- removed status
Finally I was able to find a common test case for this issue. It looks like that the combination of association proxy + event listening on the associated attribute has the effect that you have an inconsistent state while in the event handler which causes any flush to fail.
from sqlalchemy import (create_engine, event, inspect, Column, Integer, ForeignKey, String) from sqlalchemy.orm import relationship, sessionmaker from sqlalchemy.ext.associationproxy import association_proxy from sqlalchemy.ext.declarative import declarative_base Base = declarative_base() class User(Base): __tablename__ = 'user' id = Column(Integer, primary_key=True) name = Column(String(64)) keywords = association_proxy('user_keywords', 'keyword') user_keywords = relationship('UserKeyword', cascade='all, delete-orphan', backref='user') @event.listens_for(User.user_keywords, 'append') def keywords_added(user, value, initiator): session = inspect(user).session if session: session.flush() class UserKeyword(Base): __tablename__ = 'user_keyword' user_id = Column(Integer, ForeignKey('user.id'), primary_key=True) keyword_id = Column(Integer, ForeignKey('keyword.id'), primary_key=True) def __init__(self, keyword=None, **kwargs): super(UserKeyword, self).__init__(keyword=keyword, **kwargs) class Keyword(Base): __tablename__ = 'keyword' id = Column(Integer, primary_key=True) keyword = Column(String(64)) user_keywords = relationship( 'UserKeyword', cascade='all, delete-orphan', backref='keyword') engine = create_engine('sqlite:///:memory:') Base.metadata.create_all(engine) Session = sessionmaker(bind=engine) session = Session() user = User(name='xx') user.keywords.append(Keyword(keyword='x')) session.add(user) session.commit() session = Session() user = session.query(User).first() for kw in session.query(Keyword): user.keywords.append(kw)
Result:
$ python sqla_2809.py Traceback (most recent call last): File "sqla_2809.py", line 63, in <module> user.keywords.append(kw) File "d:\Projekte\HACS\hacs\packages\sqlalchemy\ext\associationproxy.py", line 568, in append self.col.append(item) File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\collections.py", line 1057, in append item = __set(self, item, _sa_initiator) File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\collections.py", line 1029, in __set item = getattr(executor, 'fire_append_event')(item, _sa_initiator) File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\collections.py", line 733, in fire_append_event item, initiator) File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\attributes.py", line 910, in fire_append_event value = fn(state, value, initiator or self) File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\events.py", line 1496, in wrap orig_fn(target, value, *arg) File "sqla_2809.py", line 27, in keywords_added session.flush() File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\session.py", line 1818, in flush self._flush(objects) File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\session.py", line 1936, in _flush transaction.rollback(_capture_exception=True) File "d:\Projekte\HACS\hacs\packages\sqlalchemy\util\langhelpers.py", line 58, in __exit__ compat.reraise(exc_type, exc_value, exc_tb) File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\session.py", line 1900, in _flush flush_context.execute() File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\unitofwork.py", line 372, in execute rec.execute(self) File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\unitofwork.py", line 525, in execute uow File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\persistence.py", line 64, in save_obj table, insert) File "d:\Projekte\HACS\hacs\packages\sqlalchemy\orm\persistence.py", line 569, in _emit_insert_statements execute(statement, params) File "d:\Projekte\HACS\hacs\packages\sqlalchemy\engine\base.py", line 662, in execute params) File "d:\Projekte\HACS\hacs\packages\sqlalchemy\engine\base.py", line 761, in _execute_clauseelement compiled_sql, distilled_params File "d:\Projekte\HACS\hacs\packages\sqlalchemy\engine\base.py", line 874, in _execute_context context) File "d:\Projekte\HACS\hacs\packages\sqlalchemy\engine\base.py", line 1024, in _handle_dbapi_exception exc_info File "d:\Projekte\HACS\hacs\packages\sqlalchemy\util\compat.py", line 195, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "d:\Projekte\HACS\hacs\packages\sqlalchemy\engine\base.py", line 867, in _execute_context context) File "d:\Projekte\HACS\hacs\packages\sqlalchemy\engine\default.py", line 324, in do_execute cursor.execute(statement, parameters) sqlalchemy.exc.IntegrityError: (IntegrityError) user_keyword.user_id may not be NULL u'INSERT INTO user_keyword (keyword_id) VALUES (?)' (1,)
-
reporter BTW, the testcase works if Keyword doesn't have a reference to UserKeyword. This was the key issue to find a reproducible testcase.
-
repo owner - changed status to invalid
OK, please, please, can we go to the mailing list. this is not a bug. You are calling flush() in the middle of an event which is occurring before your object is ready to be flushed. if you remove the event handler you're adding, there's no stack trace - you've essentially illustrated with flush() what is already happening in your real app due to autoflush (note the original stack trace has _autoflush() vs. this one which has flush()), and I've already explained, if you have a block that can't be autoflushed, please use the "no_autoflush" helper, that's what it's for.
Of course if you tell the session to flush() at the point at which your objects are not completely associated with each other such that you'll get integrity violations due to null constraints, you'll get an integrity violation. SQLAlchemy can't magically undo what you're telling it to do in order to avoid an exception.
Lets talk about where/how you can temporarily turn off autoflush on the mailing list, it's here: http://groups.google.com/group/sqlalchemy/ (though I'm off the grid for the next couple of days).
-
reporter I'm still not convinced, sorry.
I know that this is a constructed example, but any query to the database in the event handler will cause the auto flush and therefore cause this exception (which makes it not such a constructed case). And I know that I can workaround this issue by wrapping the event handler code into an with session.no_autoflush.
But I see this just as a workaround and not as the solution. I do not agree that this should be up to the user. IMO sqlalchemy should detect that it is in an inconsistent state where a flush would (or could) fail and automatically disable it until you are back in a consistent state.
-
reporter You should even consider if autoflush=False as default doesn't fit better to your "no magic" policy.
The behavior above is really unexpected (and I couldn't find any clue about such an issue in documentation) and I found similar issues regarding autoflush across the net, e.g. http://lists.okfn.org/pipermail/ckan-dev/2011-January/000189.html
-
repo owner Replying to schlamar:
You should even consider if autoflush=False as default doesn't fit better to your "no magic" policy.
that's how it was for years. autoflush being on has solved a lot of problems. I'd ask that you consider this project has been going for eight years and has evolved tremendously. At this point, not a lot is by accident anymore.
The behavior above is really unexpected (and I couldn't find any clue about such an issue in documentation) and I found similar issues regarding autoflush across the net, e.g. http://lists.okfn.org/pipermail/ckan-dev/2011-January/000189.html
there were many issues with autoflush being turned off too.
-
repo owner a quick poll on autoflush:
Hibernate (main inspiration for SQLAlchemy) - autoflush by default: http://docs.jboss.org/hibernate/orm/4.2/manual/en-US/html/ch11.html#objectstate-flushing
Storm ORM (at one time a key SQLAlchemy inspiration / competitor) - autoflush by default: https://storm.canonical.com/Tutorial#Flushing
Doctrine ORM (the creator of Doctrine is now a Python programmer and uses SQLAlchemy), but OK, they don't autoflush (ever? though I seem to see it in their source): https://doctrine-orm.readthedocs.org/en/latest/reference/working-with-objects.html#effects-of-database-and-unitofwork-being-out-of-sync
nhibernate (.NET version of hibernate) - autoflush by default: http://nhforge.org/doc/nh/en/index.html#manipulatingdata-flushing
now, we're always improving how and when we call autoflush, but for the moment we autoflush when you access an unloaded collection. because that collection's state can be composed of pending data.
for years, I was completely uncomfortable with the idea of autoflush, and even though i emulated hibernate, I thought the feature was nuts. but after dealing with the product directly and via user experience, we switched it to true - but it is always an option that can be modified. I find it most effective to be left on most of the time, and disabled just in those cases where it gets in the way. these cases are easy to identify by looking for "autoflush" in the stack trace. it makes lots of neat patterns possible, originally the "dynamic relationship loader" was the main one, where otherwise you'd append an item to the list, then access the list but the item you just appended wouldn't be there if not for autoflush. But this kind of pattern turned out to be a lot more prevalent after that point.
-
repo owner Replying to schlamar:
I know that this is a constructed example, but any query to the database in the event handler will cause the auto flush and therefore cause this exception (which makes it not such a constructed case). And I know that I can workaround this issue by wrapping the event handler code into an with session.no_autoflush.
OK well if you use most any other ORM, there either is no flush, you have to save() anytime you want INSERT or UPDATE yourself, or there are no attribute events. Use any of those ORMs and this problem doesn't exist - but neither does the feature. There is no way this particular example could function, because attribute events fire off before the actual change occurs. Only if you manually set the foreign key value to that of the user.id. do that if you'd like.
But I see this just as a workaround and not as the solution. I do not agree that this should be up to the user. IMO sqlalchemy should detect that it is in an inconsistent state where a flush would (or could) fail and automatically disable it until you are back in a consistent state
SQLAlchemy would be making a significant guess here that when it goes to autoflush on behalf of your
job.pallets
collection that's local to your event handler, that it needs to be concerned about the state of what is at this point an entirely unrelated collection ofjob.programs
that's external to the event handler. If a user expects autoflush to occur, it would be pretty surprising and inconsistent if it turned itself off in certain hard-to-spot situations. It's a lot more simple to understand that autoflush occurred, an integrity error happened, so therefore this particular event handler needs no_autoflush on it. IMHO having a consistent and predictable autoflush is a lot more straightforward than an autoflush that makes guesses in obscure situations as to when it might not want to proceed, as it wouldn't be consistently effective. -
repo owner - removed milestone
Removing milestone: 0.8.xx (automated comment)
- Log in to comment
However, the most strange thing for me is that there is an SQL execution at all at the point of
proxy.append
. Shouldn't there be SQL execution only oncommit
?