deadlock ! (with mutable/cleanup/resurrect)

Issue #2087 resolved
Mike Bayer repo owner created an issue

not sure if we'll get a test case for this. here's what we have, SQLA version is 0.6.5:

for obj in session:
  do_something_cool(obj)

Yesterday this caused what looks like a deadlock in SQLAlchemy code.  Here is the stack I grabbed using gdb:
/python2.5/sqlalchemy/orm/session.py:1353 (Session.__iter__)
/python2.5/sqlalchemy/orm/identity.py:184 (WeakInstanceDict.values)
/python2.5/sqlalchemy/orm/identity.py:188 (WeakInstanceDict.itervalues) ** self._remove_mutex.acquire()
/python2.5/sqlalchemy/orm/state.py:501 (MutableAttrInstanceState.__resurrect)
/python2.5/sqlalchemy/orm/attributes.py:925 (Events.run)
/python2.5/sqlalchemy/orm/mapper.py:2424 (_event_on_resurrect)
/python2.5/sqlalchemy/util.py:953 (OrderedSet.__iter__)
/python2.5/sqlalchemy/orm/state.py:477 (MutableAttrInstanceState._cleanup)
/python2.5/sqlalchemy/orm/identity.py:139A (WeakInstanceDict.remove) ** self._remove_mutex.acquire()

Comments (4)

  1. Mike Bayer reporter
    • changed milestone to 0.7.0

    OK here's the deal with this. The trace above is probably why we added the mutex in the first place. the simple fact is that if we used a reentrant lock up there, then we'd get dictionary size changed during iteration. this are very few ways to win this problem. attached is a new patch that's got to be in 0.7 since its a different design, that does away with mutexes and just accepts that these remove calls can just happen anywhere, inside any iteration, whatever, and we need a way for them to complete, not change the dict while we're iterating, and not be ignored else we'd be leaking memory. need to design a threading test that actually gets the "resurrect" set to fill up with something - probably by having a worker thread randomly dereference objects.

  2. Mike Bayer reporter

    an attached patch for 0.6 squeezes the mutex call to just a single call to dict.values(self). Not sure if gc on unrelated objects is still going to trigger around dict.values(self) but it should be less likely. I would think that dict.values(self) is within a single GIL call....

  3. Mike Bayer reporter

    going to go with 77c8def8720db6ab2260e1e48b86d0642219e9ff in 0.6 and 403d78697003805879be2fbad4693830e6d8d4c6 in 0.7. The former tightens the mutexing to just the .values() call. If a GC can occur synchronously within .values(), I really wish there were some way to demonstrate that. So in 0.7, we take out the mutex altogether. We added the mutex at the same time we moved from iteration to values(), and this was probably premature - just calling .values() was probably enough, GIL- and otherwise, to prevent remove() from happening at the same time in 999999 out of a million cases (its also not clear to me if the "dict changed size during iteration" error is even possible with the native .values() call, or the list(self.values() in py3k). Also given that all of 0.5 and half of 0.6 had no mutex and used a long iteration, and there was only one reported issue, shows how rare this is in any case.

  4. Log in to comment