reconnect support, again

Issue #516 resolved
Mike Bayer repo owner created an issue

we are talkikng about databases that get restarted, and the connections in the pool which are now disconnected getting invalidated automatically.

at this point i think connection.invalidate() works pretty well. we have code in mysql.py which catches certain errors upon execute() and invalidates the connection. no other DB's do though (and ppl regualrly want this on postgres).

I think what I want to do is:

  • move all the "catch an exception upon execute, check if its a connection timeout/disconnect, invalidate the connection" logic straight into connection pool.
  • execute() and executemany() instrumentation for this purpose moves into CursorFairy, which will now explicitly decorate those methods (probably faster than __getattr__() too)
  • Pool gets a kw argument disconnect_checker or similar - this is a callable which is given a DBAPI SQLError, and returns True if the error indicates the connection is no longer connected.
  • engine/strategies/DefaultStrategy will procure the "disconnect_checker" callable from each dialect, and pass it to the Pool which its constructing.
  • within CursorFairy, execute/executemany will catch all SQLErrors, run them through the checker function, if True it will set a flag "invalidate" on itself (or the parent connection). when the connection is returned to the pool, "invalidate" will be checked and result in invalidate() being called on the containing ConnectionRecord. this way we avoid references from the ConnectionFairy/CursorFairy to the ConnectionRecord. the SQLError is then re-thrown in all cases.
  • remove corresponding execute instrumentation from mysql.py
  • dialects get "get_disconnect_checker()". by default returns a callable that returns False. mysql gets it, postgres gets one - postgres errors to catch are : "psycopg2.OperationalError: connection not open"/"psycopg2.InterfaceError: connection already closed".
  • can we unit test ? using a mock DBAPI maybe?

Comments (16)

  1. paj
    • assigned issue to

    I'll have a go at this. The instructions are very clear, thanks for that Mike. The only issue I've noticed is that in the CursorFairy, the exceptions aren't SQLErrors, but whatever the DBAPI raised. I'll keep thinking :-)

  2. paj

    Ok, the exception issue is no biggie - just catch Exception instead. Given that it's always raised again eventually, this isn't a problem.

    There is an issue with this interfering with autorollback() which I'll look into some more.

  3. paj

    When testing this, SingletonThreadPool has been helpful. Also, there is existing code to invalidate the connection if creating a cursor fails, which can be confusing for testing.

    Postgres seems troublesome as the connection error exception is raised at the wrong point. Here's the traceback:

    db.execute(select(1)).fetchall() Traceback (most recent call last): File "<stdin>", line 1, in ? File "c:\python24\lib\site-packages\sqlalchemy\engine\base.py", line 735, in execute return connection.execute(statement, multiparams, params) File "c:\python24\lib\site-packages\sqlalchemy\engine\base.py", line 459, in execute return Connection.executorsc(self, object, multiparams, params) File "c:\python24\lib\site-packages\sqlalchemy\engine\base.py", line 499, in execute_clauseelement return self.execute_compiled(elem.compile(engine=self.__engine, parameters=param), *multiparams, params) File "c:\python24\lib\site-packages\sqlalchemy\engine\base.py", line 518, in execute_compiled proxy(unicode(compiled), parameters) File "c:\python24\lib\site-packages\sqlalchemy\engine\base.py", line 514, in proxy self._execute_raw(statement, parameters, cursor=cursor, context=context) File "c:\python24\lib\site-packages\sqlalchemy\engine\base.py", line 565, in _execute_raw self._autocommit(statement) File "c:\python24\lib\site-packages\sqlalchemy\engine\base.py", line 437, in _autocommit if not self.in_transaction() and re.match(r'UPDATE|INSERT|CREATE|DELETE|DROP|ALTER', statement.lstrip().upper()): File "C:\Python24\lib\sre.py", line 129, in match return _compile(pattern, flags).match(string) psycopg2.OperationalError: connection not open

  4. paj

    Ok, here's a patch for what I've done so far. Postgres does not work for some very weird reason! I haven't had a go at doing a unit test yet.

  5. Mike Bayer reporter

    patch looks perfect....its exactly what i had in mind.

    the only part i didnt understand was this:

    self._autorollback()
    
    # becomes
    
     if self.connection.connection: 
           self._autorollback()
    

    is there some point that the "ConnectionFairy" owned by the "Connection" gets turned to None?

    for the PG stack trace problem I almost wonder if that trace is corrupted by some side effect of the problem occuring within native code...but i dont really know pythons native bindings at all enough to say.

    my attentions are focused primarily on #496 and #514, which I am developing in a branch source:/branches/execcontext/lib/sqlalchemy/engine/base.py. these tickets are going to impact the API of the dialect's execute() methods (and also will probably impact MS-SQL since you guys have a lot going on in there). so after I merge that Ill merge this one in. right now the branch works for sqlite only. if you want to take a look at MS-SQL's ability to run in there that would be helpful.

  6. paj

    Ok, glad you like it.

    The ConnectionFairy gets set to None when the connection is invalidates (i.e. the new code is triggered). Doing this check before trying _autorollback means that the original error is raised, as opposed to _autorollback failing. It makes the traceback a bit clearer.

    The traceback does seem corrupted for Postgres, not really sure how to progress that one.

  7. Mike Bayer reporter

    hey i tried it with PG, had to change the psycopg2 module name to psycopg, and also add a check for "ProgrammingError: losed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request."

    but then it basically worked...running it with connection pool logging, that error was thrown on execute and the connection was invalidated. so i think itll be OK.

    going to merge the other branch first before putting this in since i cant stand to look at this old engine code.

  8. Mike Bayer reporter

    the current patch + some enhancements are checked in changeset:2488. i have a small test script that is working nicely for postgres, need to try some variations of it.

  9. Mike Bayer reporter

    OK it seems like the OperationalError: connection not open part of this is where we get the weird "asynchronous" behavior. i.e. certain exceptions it seems like psycopg2 is "cheating" in how it raises. trying to isolate...

  10. Mike Bayer reporter

    basically i consider this issue to be closed, except for the desire to have a unit test. i think the unit test can be as simple as, get connection, execute something on it, dig deep into its DBAPI connection and close it (this means getting around the ConnectionFairy), attempt execution again and assert that conneciton is invalidated.

  11. paj

    I've been thinking about the unit test, not sure we can do it the way you suggest. The is_disconnect methods are checking for specific values in the exceptions, which won't appear if the connection has been closed from within Python.

    We could have a fake dbapi that generates a suitable exception on demand. It's probably only worth doing this on a single dialect, as using a fake exception doesn't really touch dialect-specific behaviour.

    I'll have a look at doing this.

  12. Log in to comment