- edited description
Error "File descriptor was closed in another greenlet" using v0.9.4 and gevent.monkey.patch_all
I'm using gevent and sqlalchemy in my thrift(RPC) project. I upgraded SQLAlchemy from v0.9.3 to v0.9.4 a few days ago and then found the error. Following is how I use sqlalchemy and gevent:
- scoped_session(thread-local)
- use gevent.spawn to handle every request and session.close after finishing handling
The error is gone when SQLAlchemy is downgraded to v0.9.3.
I noticed that the version 0.9.4 has "Key fixes include an enhancement to the mechanics of connection pool recycling to be more efficient ". Does it have anything to do with the error.
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", line 233, in __get__
return self.impl.get(instance_state(instance), dict_)
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", line 577, in get
value = callable_(state, passive)
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/orm/state.py", line 360, in __call__
self.manager.deferred_scalar_loader(self, toload)
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/orm/loading.py", line 606, in load_scalar_attributes
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/orm/loading.py", line 230, in load_on_ident
return q.one()
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2361, in one
raise orm_exc.NoResultFound("No row was found for one()")
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2404, in __iter__
conn = conn.execution_options(**self._execution_options)
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2419, in _execute_and_instances
returned by this :class:`.Query`.
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 720, in execute
"""Execute a sql.FunctionElement object."""
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 317, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 817, in _execute_clauseelement
self.dispatch.after_execute(self,
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 947, in _execute_context
context._fetch_implicit_returning(result)
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1108, in _handle_dbapi_exception
The operations inside the function are all invoked within the
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 185, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb)
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 940, in _execute_context
context.post_insert()
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 435, in do_execute
# after the initial set of 'isolation_level', if any, so is
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/pymysql/cursors.py", line 132, in execute
result = self._query(query)
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/pymysql/cursors.py", line 271, in _query
conn.query(q)
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/pymysql/connections.py", line 726, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/pymysql/connections.py", line 861, in _read_query_result
result.read()
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/pymysql/connections.py", line 1064, in read
first_packet = self.connection._read_packet()
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/pymysql/connections.py", line 825, in _read_packet
packet = packet_type(self)
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/pymysql/connections.py", line 242, in __init__
self._recv_packet(connection)
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/pymysql/connections.py", line 248, in _recv_packet
packet_header = connection._read_bytes(4)
File "/srv/virtualenvs/zeus2env/local/lib/python2.7/site-packages/pymysql/connections.py", line 838, in _read_bytes
2013, "Lost connection to MySQL server during query (%r)" % (e,))
OperationalError: (OperationalError) (2013, "Lost connection to MySQL server during query (error(9, 'File descriptor was closed in another greenlet'))")
Comments (29)
-
reporter -
repo owner ok lets go through the bullets:
-
the pool recycling fix only takes effect when the connection pool detects that a "disconnect" situation has occurred, like when you restarted your database. though it looks like this file descriptor error is related to a "disconnect" condition.
-
the fix also is a simplification of how that used to work before. There used to be a dispose() step when we'd actively close out every connection in the pool on that event, now we just rely on the "timed invalidation" logic which otherwise hasn't changed.
-
the pool can probably be ruled out if using NullPool doesn't help with the issue.
-
thread-local scoped session works with a gevent greenlet? are we sure about that? did you call this correctly?
-
the obvious - a test script? without a test script (simple, single file, runs and shows the error) it's unlikely any steps can be taken here.
-
-
reporter I couldn't reproduce the error in my development env. I added more log code in my project and deployed it to production server on which I re-upgraded sqlalchemy to v0.9.4. But the error was not raised again so far. It's really tricky.
Anyway very thanks for your reply and sorry to trouble you.
-
repo owner OK, it does look like something went wrong and I have very little experience running SQLA with gevent and there are most likely issues like these, especially with the MySQL drivers, so if you get closer to a reproducing condition please let me know.
-
repo owner - changed status to on hold
waiting on more info, if possible.
-
I'm also running into errors along these lines using 0.9.4 with gevent. It started with this exception and does appear to be related to when the MySQL server went away:
[08/05/2014 15:24:12.848|24104(47836048)| ERROR|sqlalchemy.pool.QueuePool |pool.py:249 _close_connection ]: Exception closing connection None Traceback (most recent call last): File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 244, in _close_connection self._dialect.do_close(connection) File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 411, in do_close dbapi_connection.close() AttributeError: 'NoneType' object has no attribute 'close'
Quickly followed by:
Traceback (most recent call last): File "/home/therenow/flask/local/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run result = self._run(*self.args, **self.kwargs) File "backend/monitor.py", line 70, in event_monitor gevent.sleep(60) File "/home/therenow/flask/local/lib/python2.7/site-packages/flask/ctx.py", line 186, in __exit__ self.pop(exc_value) File "/home/therenow/flask/local/lib/python2.7/site-packages/flask/ctx.py", line 175, in pop self.app.do_teardown_appcontext(exc) File "/home/therenow/flask/local/lib/python2.7/site-packages/flask/app.py", line 1727, in do_teardown_appcontext func(exc) File "/home/therenow/flask/local/lib/python2.7/site-packages/flask_sqlalchemy/__init__.py", line 716, in shutdown_session self.session.remove() File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/orm/scoping.py", line 86, in remove self.registry().close() File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 999, in close transaction.close() File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 437, in close connection.close() File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 640, in close del self.__connection AttributeError: _Connection__connection <Greenlet at 0x2d9eb90: event_monitor> failed with AttributeError
What's most interesting and a little concerning is the exact same greenlet id gets reused as respawning occurs and more of these types of exceptions occur.
However, this all cleared up by restarting the process. So, it's like the server dropping the connection started it and it couldn't get out of it until the process was restarted.
-
Just FYI, I am using (via flask)
SQLALCHEMY_POOL_RECYCLE = 5 * 60
Is it possible there's a race condition between SQLAlchemy choosing to recycle a connection at the same time that the server drops it?
-
repo owner OK what i need to understand here is: are you getting a legit MySQL connection dropped issue first, and then the handling of it goes wrong? Or are you suddenly seeing lots of connections dropped w/ 0.9.4? very little has changed between 0.9.3 and 0.9.4, literally only what we do after a connection drop is detected.
-
repo owner if SQLA is about to recycle a connection, then the connection actually goes away, the close() will fail, the exception will be logged, and it will move on. it won't make the reference to the connection itself disappear as you illustrate in your stack trace.
-
According to my logs, I believe it is the former – a legit MySQL connection dropped issue first, and then the handling of it goes wrong until the process is restarted.
-
If I respawn a greenlet and it ends up with the same thread id as a previous one that went through this, will it get a fresh scoped session on a new connection? Or will it inherit this previous session left in this state?
-
Yes, the log indicates this sequence of events:
1) The tracebacks I posted above appeared implying the MySQL server dropped a connection.
2) Then tracebacks occurred implying the MySQL server was unavailable for reconnect ("Can't connect to MySQL server on...")
3) Then the MySQL server comes back and connections appear to be made but exceptions like the one that opened this issue start to appear. But I also see these:
Traceback (most recent call last): File "/home/therenow/flask/local/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run result = self._run(*self.args, **self.kwargs) File "backend/monitor.py", line 144, in participation_monitor for participation in Participant.query: File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2404, in __iter__ return self._execute_and_instances(context) File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2419, in _execute_and_instances result = conn.execute(querycontext.statement, self._params) File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 720, in execute return meth(self, multiparams, params) File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 317, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 817, in _execute_clauseelement compiled_sql, distilled_params File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 947, in _execute_context context) File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1111, in _handle_dbapi_exception util.reraise(*exc_info) File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 940, in _execute_context context) File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 435, in do_execute cursor.execute(statement, parameters) File "/home/therenow/flask/local/lib/python2.7/site-packages/pymysql/cursors.py", line 132, in execute result = self._query(query) File "/home/therenow/flask/local/lib/python2.7/site-packages/pymysql/cursors.py", line 271, in _query conn.query(q) File "/home/therenow/flask/local/lib/python2.7/site-packages/pymysql/connections.py", line 726, in query self._affected_rows = self._read_query_result(unbuffered=unbuffered) File "/home/therenow/flask/local/lib/python2.7/site-packages/pymysql/connections.py", line 861, in _read_query_result result.read() File "/home/therenow/flask/local/lib/python2.7/site-packages/pymysql/connections.py", line 1064, in read first_packet = self.connection._read_packet() File "/home/therenow/flask/local/lib/python2.7/site-packages/pymysql/connections.py", line 825, in _read_packet packet = packet_type(self) File "/home/therenow/flask/local/lib/python2.7/site-packages/pymysql/connections.py", line 242, in __init__ self._recv_packet(connection) File "/home/therenow/flask/local/lib/python2.7/site-packages/pymysql/connections.py", line 248, in _recv_packet packet_header = connection._read_bytes(4) File "/home/therenow/flask/local/lib/python2.7/site-packages/pymysql/connections.py", line 832, in _read_bytes data = self._rfile.read(num_bytes) RuntimeError: reentrant call inside <_io.BufferedReader name=26> <Greenlet at 0x2eaef50: participation_monitor> failed with RuntimeError
and
Traceback (most recent call last): File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 244, in _close_connection self._dialect.do_close(connection) File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 411, in do_close dbapi_connection.close() AttributeError: 'NoneType' object has no attribute 'close' Traceback (most recent call last): File "/home/therenow/flask/local/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run result = self._run(*self.args, **self.kwargs) File "backend/monitor.py", line 148, in participation_monitor session.commit() File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/orm/scoping.py", line 149, in do return getattr(self.registry(), name)(*args, **kwargs) File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 765, in commit self.transaction.commit() File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 374, in commit t[1].commit() File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1267, in commit self._do_commit() File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1298, in _do_commit self.connection._commit_impl() File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 527, in _commit_impl if self.connection._reset_agent is self.__transaction: File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 235, in connection return self._revalidate_connection() File "/home/therenow/flask/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 241, in _revalidate_connection "Can't reconnect until invalid " InvalidRequestError: Can't reconnect until invalid transaction is rolled back <Greenlet at 0x2d9ec30: participation_monitor> failed with InvalidRequestError
Those are implying to me that multiple greenlets are accessing the same session somehow but I believe I'm using scoped sessions properly. And it works fine up until this point.
-
repo owner im going to have to play with greenlets and get a better feel for what they do, but the idea is, if a connection raises an exception that we detect is a "dropped" connection, that connection is closed immediately, and its "container" is set to reconnect on next use. other connections in the pool are marked as "invalid", so they will also reconnect on next use, but the difference here is that those bad connections hang around until that point.
the first stack trace you illustrate is an exception that's caught. from this angle it looks like concurrent access to the same Connection object.
-
repo owner is this with thread-local session? how about making that request-local instead?
-
This is occurring in a backend process separate from the flask workers so there isn't really a request context. The greenlets that were involved follow this pattern:
def monitor(): with app.app_context(): while True: session = current_app.db.session # do some work session.remove() gevent.sleep(10 * 60)
-
Oh, and I'm in the same boat as the OP. I can't seem to recreate this. I just happened to have pretty good logging going on.
-
I can reproduce AttributeError https://gist.github.com/methane/7bb1dcada695d65ebbf1
But I can't reproduce RuntimeError.
-
repo owner woo thanks! something I can run, I'll try to get some time to look at this.
-
AttributeError is very simple. Just trying invalidate twice. I think this is not relating to gevent.
Log for it (code is slightly modified for full traceback).
INFO:sqlalchemy.pool.QueuePool:Invalidate connection <pymysql.connections.Connection object at 0x106e17710> (reason: OperationalError:(2013, 'Lost connection to MySQL server during query')) DEBUG:sqlalchemy.pool.QueuePool:Closing connection <pymysql.connections.Connection object at 0x106e17710> INFO:sqlalchemy.pool.QueuePool:Invalidate connection None (reason: InvalidRequestError:Can't reconnect until invalid transaction is rolled back) DEBUG:sqlalchemy.pool.QueuePool:Closing connection None Traceback (most recent call last): File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run result = self._run(*self.args, **self.kwargs) File "sample.py", line 24, in worker Session.remove() File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/sqlalchemy/orm/scoping.py", line 86, in remove self.registry().close() File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 999, in close transaction.close() File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 437, in close connection.close() File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 637, in close conn.close() File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/sqlalchemy/pool.py", line 785, in close self._checkin() File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/sqlalchemy/pool.py", line 668, in _checkin self._pool, None, self._echo, fairy=self) File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/sqlalchemy/pool.py", line 562, in _finalize_fairy connection_record.invalidate(e=e) File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/sqlalchemy/pool.py", line 491, in invalidate self.__close() File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/sqlalchemy/pool.py", line 523, in __close self.__pool._close_connection(self.connection) File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/sqlalchemy/pool.py", line 244, in _close_connection self._dialect.do_close(connection) File "/Users/inada-n/venvs/flask-gevent-pymysql/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 411, in do_close dbapi_connection.close() AttributeError: 'NoneType' object has no attribute 'close' <Greenlet at 0x104491f50: worker> failed with AttributeError
-
Thanks for your help Naoki!
That does look like it could be just a second invalidate that fired on the same greenlet instead of a different greenlet coming in and calling the invalidate.
However, the RuntimeError seems like hard proof of two greenlets interacting in the same connection context – which seems bizarre since the scoped session should be preventing that.
-
repo owner - changed status to open
this case reproduces just trying to track down all elements of it
-
repo owner -
repo owner not able to reproduce the "just invalidate twice" case, can you send code please:
# output: nothing conn = engine.connect() conn.invalidate() conn.invalidate()
# output: SAWarning: Can't invalidate an already-closed connection. conn = engine.connect() conn = conn.connection conn.invalidate() conn.invalidate()
-
repo owner -
repo owner - changed status to resolved
- Fixed some "double invalidate" situations were detected where
a connection invalidation could occur within an already critical section
like a connection.close(); ultimately, these conditions are caused
by the change in
2907
, in that the "reset on return" feature calls out to the Connection/Transaction in order to handle it, where "disconnect detection" might be caught. However, it's possible that the more recent change in2985
made it more likely for this to be seen as the "connection invalidate" operation is much quicker, as the issue is more reproducible on 0.9.4 than 0.9.3.
Checks are now added within any section that an invalidate might occur to halt further disallowed operations on the invalidated connection. This includes two fixes both at the engine level and at the pool level. While the issue was observed with highly concurrent gevent cases, it could in theory occur in any kind of scenario where a disconnect occurs within the connection close operation. fixes
#3043ref#2985ref#2907- add some defensive checks during an invalidate situation: 1. _ConnectionRecord.invalidate might be called twice within finalize_fairy if the _reset() raises an invalidate condition, invalidates, raises and then goes to invalidate the CR. so check for this. 2. similarly within Conneciton, anytime we do handle_dbapi_error(), we might become invalidated. so a following finally must check self.__invalid before dealing with the connection any futher.
→ <<cset 85d1899b76a3>>
-
repo owner OK, I've stress tested it for quite a while now and this is holding up, see https://bitbucket.org/zzzeek/sqlalchemy/src/168ca1d37d0d6ddc3ef6dc993fa561c4f46d0f52/test/perf/invalidate_stresstest.py?at=master. thanks all for the help tracking this down!
-
Thank you Mike! I will be signing up for your tip jar.
One question: were you able to reproduce the exceptions implying two greenlets accessing the same connection context? I.e the OperationalError exception that opened the issue and the RuntimeError exception I posted.
Or is that evidence of perhaps another issue not directly related to your fix?
-
repo owner didn't see the RuntimeError you refer to but saw other errors within pymysql that were resolved when I upgraded to the latest, issues where the database disconnect caught it by surprise and it was unable to recover. There are very possibly more of these.
-
repo owner see also
#3803 - Log in to comment