restarting mysql can cause connections to not be returned to the pool

Issue #2695 resolved
Anonymous created an issue

If you restart the MySQL server while a query is being executed against it, this can result in the connection not being properly returned to the pool.

Attached is an example script which just loops, executing a query against MySQL and then asserting that no connections remain checkout out of the pool. AFAICT this is an appropriate invariant of the engine/pool - when no queries are being run, there should be nothing checked out.

If you run this script, then restart MySQL while it's running, a connection will often remain checked out after e.execute() returns, triggering the assertion.

As near as I can tell, what happens is the following:

  • MySQL gets the signal to restart while a query is being executed
  • It terminates the running query, producing error "1317 Query Execution was interrupted"
  • The Connection._handle_dbapi_exception() method is invoked to deal with the error
  • This error does not signal a disconnect, so we attempt to execute self._autorollback() before returning the connection to the pool
  • MySQL has stopped, so this raises another error, now "2006 MySQL server has gone away"
  • Connection._handle_dbapi_exception() is invoked again, but immediately re-raises the error since self._reentrant_error is now set to True
  • The error propagates up to the calling code, and the connection is not returned to the pool

The connection is dead at this point, so it should be invalidated. Instead it stays around as a "zombie" connection, not referenced from anywhere but still marked as checked out by the pool.

Comments (7)

  1. Michael Bayer repo owner

    the attached patch is against 0.8 tip. It basically moves up the disconnect check to the top and sets a connection-wide one-way trigger. the dispose is then done in the finally. This might be a way to do this with minimal changes to the existing flow, care to try it out? (also "SHOW DATABASES" runs in like 50 ms here, I'm assuming that query is supposed to be long running?)

  2. Anonymous

    Great, the patch seems to have done the trick! Thanks for a speedy response.

    I wonder if the should_close_with_result logic should also be moved into the finally block, in case of re-entrant errors that are not connection related?

    (also "SHOW DATABASES" runs in like 50 ms here, I'm assuming that query is supposed to be long running?)

    Yeah, a long-running query would have been better, but even with just "SHOW DATABASES" the bug could be quite reliably reproduced on my machine.

  3. Michael Bayer repo owner

    right now this code probably assumes a reentrant error is connection related, since it's doing an unconditional DBAPIError wrap in that case. At the moment I think if we're experiencing some case where each of cursor.close(), rollback(), connection.close() all may throw some new kind of exception each time, we're in a pretty catastrophic situation anyway. the next attachment here puts the autorollback in another try/finally.

    If this patch is OK are you OK with 0.8 or are you stuck on 0.7?

  4. Log in to comment