SingletonThreadPool and threading.local() usage

Issue #3376 invalid
André Anjos created an issue

I'm a bit puzzled by the SingletonThreadPool implementation. As per recommendation (see discussion on this StackOverflow thread), the call to threading.local() must be global, whereas in sqlalchemy.pool, it is rather local, at the constructor of the SingletonThreadPool object.

As a result, If many sessions are attempted to the database on the same process, the constructor of the SingletonThreadPool object will be triggered many times and a fake perception that the singleton does not exist will kick-in and trigger the re-construction of a connection.

After sometime, we start getting spurious messages:

Exception during reset or similar
Traceback (most recent call last):
  File "/idiap/project/beat/beat.env.develop/usr/lib/python2.7/site-packages/sqlalchemy/pool.py", line 568, in _finalize_fairy
    fairy._reset(pool)
  File "/idiap/project/beat/beat.env.develop/usr/lib/python2.7/site-packages/sqlalchemy/pool.py", line 702, in _reset
    pool._dialect.do_rollback(self)
  File "/idiap/project/beat/beat.env.develop/usr/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 412, in do_rollback
    dbapi_connection.rollback()
ProgrammingError: Cannot operate on a closed database.

We wonder if this is not the cause. Could somebody verify this please?

Comments (6)

  1. Mike Bayer repo owner

    the call to threading.local() must be global, whereas in sqlalchemy.pool, it is rather local, at the constructor of the SingletonThreadPool object.

    this is a misinterpretation of what that SO answer is talking about. The threading.local() object itself is a shared resource, and in that example, they are working with a module-level function. So the therading.local() is declared at the module level, and consumed by the function. In SingletonThreadPool, we're a class. The threading.local() is declared at the instance level, and consumed by the method which is always related to that instance.

    As a result, If many sessions are attempted to the database on the same process, the constructor of the SingletonThreadPool object will be triggered many times

    no, that doesn't happen at all. The constructor of your pool is the constructor. It is called exactly when you say, "pool = SingletonThreadPool()", and that is it.

    After sometime, we start getting spurious messages: "Cannot operate on a closed database."

    you will get this if you are using more threads than you have set pool_size to. This pool will go through connections and just close them out if you are using more connections than you've told it you would.

    here is a demo:

    from sqlalchemy import create_engine, pool
    import threading
    import time
    import random
    
    e = create_engine(
        "sqlite:///foo.db",
        poolclass=pool.SingletonThreadPool,
        connect_args={"check_same_thread": False},
    
        # uncomment to allow it to work
        # pool_size=10
    )
    
    dont_segfault = threading.Lock()
    
    
    def do_thing():
        conn = e.connect()
        for i in range(5):
            with dont_segfault:
                conn.scalar("select 1")
            time.sleep(random.randint(1, 10) / 10)
    
    
    threads = [threading.Thread(target=do_thing) for i in range(10)]
    for t in threads:
        t.start()
        time.sleep(random.randint(1, 10) / 10)
    
    for t in threads:
        t.join()
    

    the SingletonThreadPool is really not a general use pool in any case. It's good for testing against a SQLite :memory: database and that's about it.

  2. Mike Bayer repo owner

    no issue is illustrated here and the behavior noted so far is expected under certain circumstances, so without specifics there's no evidence of an issue.

  3. André Anjos reporter

    Thanks for the quick feedback.

    The SingletonThreadPool is not selected explicitly in our code, it is a result of the way we connect to the database, by providing our own SQLite connector (which is in turn based on apsw). The internal test by SQLAlchemy makes it choose this Pool strategy instead of the default NullPool.

    Furthermore, our code has absolutely no threads! So, I don't understand what is happening...

    Question: I see that our sessions are created through an "exec" call, which is kept isolated from the current globals(). Do you think this could trigger this strange behaviour?

  4. Mike Bayer repo owner

    by providing our own SQLite connector (which is in turn based on apsw).

    that is a very likely place to look for issues.

    The internal test by SQLAlchemy makes it choose this Pool strategy instead of the default NullPool.

    what happens when you set it back to NullPool? Or just use StaticPool or any of the other pools?

  5. André Anjos reporter

    When we set it back to the NullPool, then it works just fine. Else, it just throws exceptions as above.

    The fact that we create the session through an exec statement, I think, is the problem. For some reason, the "threading.local()" call is returning a new object every time Session() is called, what makes the Pool go bananas, even if the multiple calls are done in the context of the same thread. I'll try to create an example and post it here.

  6. Mike Bayer repo owner

    exec ! wow, yes that would be a significant detail if you're observing weird issues. make sure you are passing along important things like globals() and locals() correctly.

  7. Log in to comment