Error "File descriptor was closed in another greenlet" using v0.9.4 and gevent.monkey.patch_all

Issue #3043 resolved
Gaofeng Liang created an issue

I'm using gevent and sqlalchemy in my thrift(RPC) project. I upgraded SQLAlchemy from v0.9.3 to v0.9.4 a few days ago and then found the error. Following is how I use sqlalchemy and gevent:

  • scoped_session(thread-local)
  • use gevent.spawn to handle every request and session.close after finishing handling

The error is gone when SQLAlchemy is downgraded to v0.9.3.

I noticed that the version 0.9.4 has "Key fixes include an enhancement to the mechanics of connection pool recycling to be more efficient ". Does it have anything to do with the error.

File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", line 233, in __get__
    return self.impl.get(instance_state(instance), dict_)
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", line 577, in get
    value = callable_(state, passive)
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/orm/state.py", line 360, in __call__
    self.manager.deferred_scalar_loader(self, toload)
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/orm/loading.py", line 606, in load_scalar_attributes

  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/orm/loading.py", line 230, in load_on_ident
    return q.one()
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2361, in one
    raise orm_exc.NoResultFound("No row was found for one()")
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2404, in __iter__
    conn = conn.execution_options(**self._execution_options)
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2419, in _execute_and_instances
    returned by this :class:`.Query`.
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 720, in execute
    """Execute a sql.FunctionElement object."""
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 317, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 817, in _execute_clauseelement
    self.dispatch.after_execute(self,
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 947, in _execute_context
    context._fetch_implicit_returning(result)
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1108, in _handle_dbapi_exception
    The operations inside the function are all invoked within the
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 185, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 940, in _execute_context
    context.post_insert()
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 435, in do_execute
    # after the initial set of 'isolation_level', if any, so is
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/pymysql/cursors.py", line 132, in execute
    result = self._query(query)
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/pymysql/cursors.py", line 271, in _query
    conn.query(q)
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/pymysql/connections.py", line 726, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/pymysql/connections.py", line 861, in _read_query_result
    result.read()
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/pymysql/connections.py", line 1064, in read
    first_packet = self.connection._read_packet()
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/pymysql/connections.py", line 825, in _read_packet
    packet = packet_type(self)
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/pymysql/connections.py", line 242, in __init__
    self._recv_packet(connection)
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/pymysql/connections.py", line 248, in _recv_packet
    packet_header = connection._read_bytes(4)
  File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/pymysql/connections.py", line 838, in _read_bytes
    2013, "Lost connection to MySQL server during query (%r)" % (e,))
OperationalError: (OperationalError) (2013, "Lost connection to MySQL server during query (error(9, 'File descriptor was closed in another greenlet'))")

Comments (29)

  1. Mike Bayer repo owner

    ok lets go through the bullets:

    • the pool recycling fix only takes effect when the connection pool detects that a "disconnect" situation has occurred, like when you restarted your database. though it looks like this file descriptor error is related to a "disconnect" condition.

    • the fix also is a simplification of how that used to work before. There used to be a dispose() step when we'd actively close out every connection in the pool on that event, now we just rely on the "timed invalidation" logic which otherwise hasn't changed.

    • the pool can probably be ruled out if using NullPool doesn't help with the issue.

    • thread-local scoped session works with a gevent greenlet? are we sure about that? did you call this correctly?

    • the obvious - a test script? without a test script (simple, single file, runs and shows the error) it's unlikely any steps can be taken here.

  2. Gaofeng Liang reporter

    I couldn't reproduce the error in my development env. I added more log code in my project and deployed it to production server on which I re-upgraded sqlalchemy to v0.9.4. But the error was not raised again so far. It's really tricky.

    Anyway very thanks for your reply and sorry to trouble you.

  3. Mike Bayer repo owner

    OK, it does look like something went wrong and I have very little experience running SQLA with gevent and there are most likely issues like these, especially with the MySQL drivers, so if you get closer to a reproducing condition please let me know.

  4. davidkhess

    I'm also running into errors along these lines using 0.9.4 with gevent. It started with this exception and does appear to be related to when the MySQL server went away:

    [08/05/2014 15:24:12.848|24104(47836048)|  ERROR|sqlalchemy.pool.QueuePool     |pool.py:249 _close_connection      ]: Exception closing connection None
    Traceback (most recent call last):
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 244, in _close_connection
        self._dialect.do_close(connection)
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 411, in do_close
        dbapi_connection.close()
    AttributeError: 'NoneType' object has no attribute 'close'
    

    Quickly followed by:

    Traceback (most recent call last):
      File "/home/therenow/flask/local/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run
        result = self._run(*self.args, **self.kwargs)
      File "backend/monitor.py", line 70, in event_monitor
        gevent.sleep(60)
      File "/home/therenow/flask/local/lib/python2.7/site-packages/flask/ctx.py", line 186, in __exit__
        self.pop(exc_value)
      File "/home/therenow/flask/local/lib/python2.7/site-packages/flask/ctx.py", line 175, in pop
        self.app.do_teardown_appcontext(exc)
      File "/home/therenow/flask/local/lib/python2.7/site-packages/flask/app.py", line 1727, in do_teardown_appcontext
        func(exc)
      File "/home/therenow/flask/local/lib/python2.7/site-packages/flask_sqlalchemy/__init__.py", line 716, in shutdown_session
        self.session.remove()
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/orm/scoping.py", line 86, in remove
        self.registry().close()
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 999, in close
        transaction.close()
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 437, in close
        connection.close()
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 640, in close
        del self.__connection
    AttributeError: _Connection__connection
    <Greenlet at 0x2d9eb90: event_monitor> failed with AttributeError
    

    What's most interesting and a little concerning is the exact same greenlet id gets reused as respawning occurs and more of these types of exceptions occur.

    However, this all cleared up by restarting the process. So, it's like the server dropping the connection started it and it couldn't get out of it until the process was restarted.

  5. davidkhess

    Just FYI, I am using (via flask)

    SQLALCHEMY_POOL_RECYCLE = 5 * 60
    

    Is it possible there's a race condition between SQLAlchemy choosing to recycle a connection at the same time that the server drops it?

  6. Mike Bayer repo owner

    OK what i need to understand here is: are you getting a legit MySQL connection dropped issue first, and then the handling of it goes wrong? Or are you suddenly seeing lots of connections dropped w/ 0.9.4? very little has changed between 0.9.3 and 0.9.4, literally only what we do after a connection drop is detected.

  7. Mike Bayer repo owner

    if SQLA is about to recycle a connection, then the connection actually goes away, the close() will fail, the exception will be logged, and it will move on. it won't make the reference to the connection itself disappear as you illustrate in your stack trace.

  8. davidkhess

    According to my logs, I believe it is the former – a legit MySQL connection dropped issue first, and then the handling of it goes wrong until the process is restarted.

  9. davidkhess

    If I respawn a greenlet and it ends up with the same thread id as a previous one that went through this, will it get a fresh scoped session on a new connection? Or will it inherit this previous session left in this state?

  10. davidkhess

    Yes, the log indicates this sequence of events:

    1) The tracebacks I posted above appeared implying the MySQL server dropped a connection.

    2) Then tracebacks occurred implying the MySQL server was unavailable for reconnect ("Can't connect to MySQL server on...")

    3) Then the MySQL server comes back and connections appear to be made but exceptions like the one that opened this issue start to appear. But I also see these:

    Traceback (most recent call last):
      File "/home/therenow/flask/local/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run
        result = self._run(*self.args, **self.kwargs)
      File "backend/monitor.py", line 144, in participation_monitor
        for participation in Participant.query:
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2404, in __iter__
        return self._execute_and_instances(context)
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2419, in _execute_and_instances
        result = conn.execute(querycontext.statement, self._params)
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 720, in execute
        return meth(self, multiparams, params)
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 317, in _execute_on_connection
        return connection._execute_clauseelement(self, multiparams, params)
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 817, in _execute_clauseelement
        compiled_sql, distilled_params
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 947, in _execute_context
        context)
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1111, in _handle_dbapi_exception
        util.reraise(*exc_info)
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 940, in _execute_context
        context)
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 435, in do_execute
        cursor.execute(statement, parameters)
      File "/home/therenow/flask/local/lib/python2.7/site-packages/pymysql/cursors.py", line 132, in execute
        result = self._query(query)
      File "/home/therenow/flask/local/lib/python2.7/site-packages/pymysql/cursors.py", line 271, in _query
        conn.query(q)
      File "/home/therenow/flask/local/lib/python2.7/site-packages/pymysql/connections.py", line 726, in query
        self._affected_rows = self._read_query_result(unbuffered=unbuffered)
      File "/home/therenow/flask/local/lib/python2.7/site-packages/pymysql/connections.py", line 861, in _read_query_result
        result.read()
      File "/home/therenow/flask/local/lib/python2.7/site-packages/pymysql/connections.py", line 1064, in read
        first_packet = self.connection._read_packet()
      File "/home/therenow/flask/local/lib/python2.7/site-packages/pymysql/connections.py", line 825, in _read_packet
        packet = packet_type(self)
      File "/home/therenow/flask/local/lib/python2.7/site-packages/pymysql/connections.py", line 242, in __init__
        self._recv_packet(connection)
      File "/home/therenow/flask/local/lib/python2.7/site-packages/pymysql/connections.py", line 248, in _recv_packet
        packet_header = connection._read_bytes(4)
      File "/home/therenow/flask/local/lib/python2.7/site-packages/pymysql/connections.py", line 832, in _read_bytes
        data = self._rfile.read(num_bytes)
    RuntimeError: reentrant call inside <_io.BufferedReader name=26>
    <Greenlet at 0x2eaef50: participation_monitor> failed with RuntimeError
    

    and

    Traceback (most recent call last):
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 244, in _close_connection
        self._dialect.do_close(connection)
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 411, in do_close
        dbapi_connection.close()
    AttributeError: 'NoneType' object has no attribute 'close'
    Traceback (most recent call last):
      File "/home/therenow/flask/local/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run
        result = self._run(*self.args, **self.kwargs)
      File "backend/monitor.py", line 148, in participation_monitor
        session.commit()
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/orm/scoping.py", line 149, in do
        return getattr(self.registry(), name)(*args, **kwargs)
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 765, in commit
        self.transaction.commit()
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 374, in commit
        t[1].commit()
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1267, in commit
        self._do_commit()
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1298, in _do_commit
        self.connection._commit_impl()
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 527, in _commit_impl
        if self.connection._reset_agent is self.__transaction:
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 235, in connection
        return self._revalidate_connection()
      File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 241, in _revalidate_connection
        "Can't reconnect until invalid "
    InvalidRequestError: Can't reconnect until invalid transaction is rolled back
    <Greenlet at 0x2d9ec30: participation_monitor> failed with InvalidRequestError
    

    Those are implying to me that multiple greenlets are accessing the same session somehow but I believe I'm using scoped sessions properly. And it works fine up until this point.

  11. Mike Bayer repo owner

    im going to have to play with greenlets and get a better feel for what they do, but the idea is, if a connection raises an exception that we detect is a "dropped" connection, that connection is closed immediately, and its "container" is set to reconnect on next use. other connections in the pool are marked as "invalid", so they will also reconnect on next use, but the difference here is that those bad connections hang around until that point.

    the first stack trace you illustrate is an exception that's caught. from this angle it looks like concurrent access to the same Connection object.

  12. davidkhess

    This is occurring in a backend process separate from the flask workers so there isn't really a request context. The greenlets that were involved follow this pattern:

    def monitor():
        with app.app_context():
            while True:
                session = current_app.db.session
    
                # do some work
    
                session.remove()
    
                gevent.sleep(10 * 60) 
    
  13. davidkhess

    Oh, and I'm in the same boat as the OP. I can't seem to recreate this. I just happened to have pretty good logging going on.

  14. Naoki INADA

    AttributeError is very simple. Just trying invalidate twice. I think this is not relating to gevent.

    Log for it (code is slightly modified for full traceback).

    INFO:sqlalchemy.pool.QueuePool:Invalidate connection <pymysql.connections.Connection object at 0x106e17710> (reason: OperationalError:(2013, 'Lost connection to MySQL server during query'))
    DEBUG:sqlalchemy.pool.QueuePool:Closing connection <pymysql.connections.Connection object at 0x106e17710>
    INFO:sqlalchemy.pool.QueuePool:Invalidate connection None (reason: InvalidRequestError:Can't reconnect until invalid transaction is rolled back)
    DEBUG:sqlalchemy.pool.QueuePool:Closing connection None
    Traceback (most recent call last):
      File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run
        result = self._run(*self.args, **self.kwargs)
      File "sample.py", line 24, in worker
        Session.remove()
      File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/sqlalchemy/orm/scoping.py", line 86, in remove
        self.registry().close()
      File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 999, in close
        transaction.close()
      File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 437, in close
        connection.close()
      File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 637, in close
        conn.close()
      File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/sqlalchemy/pool.py", line 785, in close
        self._checkin()
      File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/sqlalchemy/pool.py", line 668, in _checkin
        self._pool, None, self._echo, fairy=self)
      File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/sqlalchemy/pool.py", line 562, in _finalize_fairy
        connection_record.invalidate(e=e)
      File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/sqlalchemy/pool.py", line 491, in invalidate
        self.__close()
      File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/sqlalchemy/pool.py", line 523, in __close
        self.__pool._close_connection(self.connection)
      File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/sqlalchemy/pool.py", line 244, in _close_connection
        self._dialect.do_close(connection)
      File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 411, in do_close
        dbapi_connection.close()
    AttributeError: 'NoneType' object has no attribute 'close'
    <Greenlet at 0x104491f50: worker> failed with AttributeError
    
  15. davidkhess

    Thanks for your help Naoki!

    That does look like it could be just a second invalidate that fired on the same greenlet instead of a different greenlet coming in and calling the invalidate.

    However, the RuntimeError seems like hard proof of two greenlets interacting in the same connection context – which seems bizarre since the scoped session should be preventing that.

  16. Mike Bayer repo owner

    not able to reproduce the "just invalidate twice" case, can you send code please:

    # output: nothing
    conn = engine.connect()
    conn.invalidate()
    conn.invalidate()
    
    # output: SAWarning: Can't invalidate an already-closed connection.
    conn = engine.connect()
    conn = conn.connection
    conn.invalidate()
    conn.invalidate()
    
  17. Mike Bayer repo owner

    the ultimate trigger for this issue is not #2985 but instead #2907, which is in 0.9.2. A more intensive version of the gevent test case can trigger similar issues in 0.9.3 as well.

  18. Mike Bayer repo owner
    • Fixed some "double invalidate" situations were detected where a connection invalidation could occur within an already critical section like a connection.close(); ultimately, these conditions are caused by the change in 🎫2907, in that the "reset on return" feature calls out to the Connection/Transaction in order to handle it, where "disconnect detection" might be caught. However, it's possible that the more recent change in 🎫2985 made it more likely for this to be seen as the "connection invalidate" operation is much quicker, as the issue is more reproducible on 0.9.4 than 0.9.3.

    Checks are now added within any section that an invalidate might occur to halt further disallowed operations on the invalidated connection. This includes two fixes both at the engine level and at the pool level. While the issue was observed with highly concurrent gevent cases, it could in theory occur in any kind of scenario where a disconnect occurs within the connection close operation. fixes #3043 ref #2985 ref #2907

    • add some defensive checks during an invalidate situation: 1. _ConnectionRecord.invalidate might be called twice within finalize_fairy if the _reset() raises an invalidate condition, invalidates, raises and then goes to invalidate the CR. so check for this. 2. similarly within Conneciton, anytime we do handle_dbapi_error(), we might become invalidated. so a following finally must check self.__invalid before dealing with the connection any futher.

    → <<cset 85d1899b76a3>>

  19. davidkhess

    Thank you Mike! I will be signing up for your tip jar.

    One question: were you able to reproduce the exceptions implying two greenlets accessing the same connection context? I.e the OperationalError exception that opened the issue and the RuntimeError exception I posted.

    Or is that evidence of perhaps another issue not directly related to your fix?

  20. Mike Bayer repo owner

    didn't see the RuntimeError you refer to but saw other errors within pymysql that were resolved when I upgraded to the latest, issues where the database disconnect caught it by surprise and it was unable to recover. There are very possibly more of these.

  21. Log in to comment