Race condition in dialect initialization

Issue #2964 resolved
Jeff Dairiki created an issue

There is a race condition in DefaultEngineStrategy around line 161 where dialect.initialize() gets called from a listener attached to the pool first_connect event.

If there are multiple threads all rushing to query the database, however, the second thread to get a connection can start running queries before the first thread has finished initializing the dialect.

(In my case, dialect.returns_unicode_strings was intermittently not being initialized before the first queries were performed. This resulted in a UnicodeResultProcessor being used when it shouldn't have been in turn leading to UnicodeEncodeError.)

Not sure what the best fix is. My guess is that dialect.initialized should be hooked into the connect event instead of first_connect, with a wrapper which ensures that dialect.initialize only gets called once by the first thread to try — blocking subsequent threads until the initialization is complete. (The locking/blocking could perhaps be folded into util.only_once.)

Comments (15)

  1. Mike Bayer repo owner

    how about this?

    diff --git a/lib/sqlalchemy/util/langhelpers.py b/lib/sqlalchemy/util/langhelpers.py
    index 82e37ce..32ac389 100644
    --- a/lib/sqlalchemy/util/langhelpers.py
    +++ b/lib/sqlalchemy/util/langhelpers.py
    @@ -20,6 +20,7 @@ from .. import exc
     import hashlib
     from . import compat
     from . import _collections
    +import threading
    
     def md5_hex(x):
         if compat.py3k:
    @@ -1174,10 +1175,17 @@ def only_once(fn):
         once."""
    
         once = [fn]
    +    lock = threading.Lock()
    +
         def go(*arg, **kw):
             if once:
    -            once_fn = once.pop()
    -            return once_fn(*arg, **kw)
    +            with lock:
    +                if once:
    +                    try:
    +                        return once[0](*arg, **kw)
    +                    finally:
    +                        once.pop()
    +
    
         return update_wrapper(go, fn)
    

    not sure if I feel like getting a whole concurrency test for this one, any chance your issue is reproducible and you could test this?

  2. Mike Bayer repo owner

    oh right, its first_connect only, OK not sure if i want to move it to connect, let me think about that

  3. Mike Bayer repo owner

    different one, do it in the event, as this code is called every time (note I'm trying to make it so the mutex is out of the picture once the init is finished):

    diff --git a/lib/sqlalchemy/event/attr.py b/lib/sqlalchemy/event/attr.py
    index 3f89475..0cea965 100644
    --- a/lib/sqlalchemy/event/attr.py
    +++ b/lib/sqlalchemy/event/attr.py
    @@ -35,6 +35,7 @@ from . import registry
     from . import legacy
     from itertools import chain
     import weakref
    +import threading
    
     class RefCollection(object):
         @util.memoized_property
    @@ -230,13 +231,21 @@ class _EmptyListener(_HasParentDispatchDescriptor):
     class _CompoundListener(_HasParentDispatchDescriptor):
         _exec_once = False
    
    +    @util.memoized_property
    +    def _exec_once_mutex(self):
    +        return threading.RLock()
    +
         def exec_once(self, *args, **kw):
             """Execute this event, but only if it has not been
             executed already for this collection."""
    
             if not self._exec_once:
    -            self(*args, **kw)
    -            self._exec_once = True
    +            with self._exec_once_mutex:
    +                if not self._exec_once:
    +                    try:
    +                        self(*args, **kw)
    +                    finally:
    +                        self._exec_once = True
    
         def __call__(self, *args, **kw):
             """Execute this event."""
    
  4. Jeff Dairiki reporter

    Thank you for the quick response, Mike!

    I have a seems-to-be-reproducible-at-least-for-now test that involves configuring our complete (large) pyramid app and then making making two simultaneous requests for a specific URL. (Not sure exactly why that particular URL triggers the problem.)

    I've spent some time trying to come up with a small test case, but so far have not had success.


    Your second patch appears to work with my test case. (The first actually worked too, for me. Due to a race in _CompoundListener.exec_once — which your second patch fixes — both threads wound up calling the first_connect handler in my case.)

    Reminder: probably want sqlalchemy.util.threading instead of threading (for those environments without threads.)

    OCD: since there's a mutex, you can set self._exec_once before calling self(args, *kw), avoiding the try/finally. E.g.

     with self._exec_once_mutex:
         if not self._exec_once:
             self._exec_once = True
             self(*args, **kw)
    
  5. Jeff Dairiki reporter

    OCD: since there's a mutex, you can set self._exec_once before calling self(args, *kw), avoiding the try/finally.

    Sorry, I take that back. This would only be safe if all checks of self._exec_once were inside the mutex (which they are not).

  6. Jeff Dairiki reporter

    Nice test case!

    Did you mean to drop the outer if not self._exec_once that was in your original (second) patch? I don't know that it makes a big difference, but it's probably better to avoid acquiring the lock when it's not necessary.

  7. Mike Bayer repo owner

    once first_connect() is over i want to avoid acquiring the lock, yes, as that is much more expensive

  8. Jeff Dairiki reporter

    Agreed. To be clear, your original fix (above) had

    def exec_once(self, *args, **kw):
        if not self._exec_once:
             with self._exec_once_mutex:
                if not self._exec_once:
                     # ...
    

    but in dd32540 (and 4ac0558) there is only

    def exec_once(self, *args, **kw):
        with self._exec_once_mutex:
             if not self._exec_once:
                 # ...
    

    This acquires the lock on each call of exec_once, which (I'm pretty sure) gets called on every new connection.

  9. Log in to comment