Very weird `KeyError` exception

Issue #4030 resolved
Dariusz Dzialak created an issue
Traceback (most recent call last):
  File "site-packages/cherrypy/_cprequest.py", line 670, in respond
  File "site-packages/cherrypy/lib/encoding.py", line 221, in __call__
  File "site-packages/cherrypy/_cpdispatch.py", line 60, in __call__
  File "sfutils/perf_stats.py", line 206, in wrapper
  File "sfutils/base_service.py", line 461, in wrapper
  File "sfutils/base_service.py", line 395, in call_f_and_return_json
  File "sfvolumes/service.py", line 260, in request_save_volume
  File "sfvolumes/service.py", line 229, in _save_volume
  File "sfutils/perf_stats.py", line 206, in wrapper
  File "sfvolumes/dao.py", line 206, in get_volume_by_name
  File "sfvolumes/dao.py", line 190, in _get_volume_by_name
  File "site-packages/sqlalchemy/orm/query.py", line 2814, in one
  File "site-packages/sqlalchemy/orm/query.py", line 2784, in one_or_none
  File "site-packages/sqlalchemy/orm/loading.py", line 90, in instances
  File "site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
  File "site-packages/sqlalchemy/orm/loading.py", line 75, in instances
  File "site-packages/sqlalchemy/orm/loading.py", line 385, in _instance
  File "site-packages/sqlalchemy/orm/identity.py", line 151, in get
KeyError: (<class 'sfutils.pg_api.alchemy_schema.Volume'>, (2L,))

It's very weird because code within orm/identity.py looks like:

[148]    def get(self, key, default=None):
[149]        if key not in self._dict:
[150]            return default
[151]        state = self._dict[key]

First time I've though that session is used in two threads but session is stored only within dao and dao is a local parameter so it's not possible that more than one thread is using the same session. The other strange thing is backtrace - can't explain why raise_from_cause (from util.py) invoked instances (from loading.py). I've saw that weakref is used so is it possible that default _identity_cls = identity.WeakInstanceDict from session caused that fired garbage collector freed some objects and removed some key from self._dict within identity_map? (just while thread was between identity.py:149 and identity.py:151)?

We use sqlalchemy very intensively and that problem happened three times since last four months (so it seems to be very rare case).

Versions: SQLAlchemy (1.1.10) psycopg2 (2.7.1) postgresql 9.6+184.pgdg14.04+1 python 2.7.12

Unfortunately can't create any Minimal, Complete and Verifiable Example as it seems to be very rare case.

Comments (12)

  1. Mike Bayer repo owner

    it's not so much threads as it is that gc just runs in the middle of virtually anything. Looking at WeakInstanceDict I'm a little surprised this pattern is present in quite a few methods, and even more that a couple of methods do guard against it.

    The reason there's a lot of "if key in dict" is because if the item is expected to often not be inside the dict, this is a cheaper check than catching KeyError. However, the catch for KeyError should still be there to accommodate for this race.

    Methods that have a defensive KeyError are:

    WeakInstanceDict.__contains__
    WeakInstanceDict.add
    

    methods that are checking "key in dict", then accessing dict[key] without anticipating KeyError (and also do not expect to throw KeyError) are:

    WeakInstanceDict.contains_state
    WeakInstanceDict.replace
    WeakInstanceDict.get
    WeakInstanceDict.safe_discard
    

    above those all need to catch KeyError.

  2. Dariusz Dzialak reporter

    indeed, raising exception is expensive, my favorite is: result = self._dict.get(key, default_value) and later usage of is operator to verify if result is default_value: ...)

  3. Mike Bayer repo owner

    Guard all indexed access in WeakInstanceDict

    Added KeyError checks to all methods within :class:.WeakInstanceDict where a check for key in dict is followed by indexed access to that key, to guard against a race against garbage collection that under load can remove the key from the dict after the code assumes its present, leading to very infrequent KeyError raises.

    Change-Id: I881cc2899f7961d29a0549f44149a2615ae7a4ea Fixes: #4030

    → <<cset 3d8049e32d63>>

  4. Mike Bayer repo owner

    Guard all indexed access in WeakInstanceDict

    Added KeyError checks to all methods within :class:.WeakInstanceDict where a check for key in dict is followed by indexed access to that key, to guard against a race against garbage collection that under load can remove the key from the dict after the code assumes its present, leading to very infrequent KeyError raises.

    Change-Id: I881cc2899f7961d29a0549f44149a2615ae7a4ea Fixes: #4030 (cherry picked from commit 1a75f676d75b5daf67455d6e85b7b4df6f2316dc)

    → <<cset 676c49b51de2>>

  5. David Feltell

    We believe we are (still) having this sporadic problem with SQLAlchemy==1.1.13. We merge a (fairly complex) object into a session from a cache with every request to our server. We get a handful of KeyErrors each day, out of dozens of requests per second, so it is very sporadic. The relevant parts of the stack trace are

       user = session.merge(cachedUser, load=False)
     File "/venv/lib/python2.7/site-packages/sqlalchemy/orm/scoping.py", line 157, in do
       return getattr(self.registry(), name)(*args, **kwargs)
     File "/venv/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1849, in merge
       _resolve_conflict_map=_resolve_conflict_map)
     File "/venv/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1875, in _merge
       merged = self.identity_map[key]
     File "/venv/lib/python2.7/site-packages/sqlalchemy/orm/identity.py", line 89, in __getitem__
       state = self._dict[key]
    KeyError: <snip>
    

    Does this look related?

  6. Mike Bayer repo owner

    hmmm, OK didn't miss it, because that method is supposed to raise a key error anyway.

  7. David Feltell

    ... except for the fact that it's not the explicit raise KeyError(key) that is triggering, it's the dictionary lookup before it.

  8. Log in to comment