SSL EOF not detected as disconnect in psycopg2

Issue #3021 resolved
Andrew Suffield
created an issue

PGDialect_psycopg2.is_disconnect does not recognise the string "SSL SYSCALL error: EOF detected" as a disconnect error, with all the chaos that usually causes. This is the error I get when using postgres 9.3 and the connection goes away below an ssl connection

Comments (26)

  1. Michael Bayer repo owner

    OK, been getting some new error codes recently as folks are upgrading OpenSSL, have been backporting these to 0.8.

    Can you confirm this would work: "EOF detected" in str(e)

  2. Michael Bayer repo owner
    • The psycopg2 .closed accessor is now consulted when determining
      if an exception is a "disconnect" error; ideally, this should remove
      the need for any other inspection of the exception message to detect
      disconnect, however we will leave those existing messages in place
      as a fallback. This should be able to handle newer cases like
      "SSL EOF" conditions. Pull request courtesy Dirk Mueller.
      fixes #3021

    → <<cset 860b413ade32>>

  3. Michael Bayer repo owner
    • The psycopg2 .closed accessor is now consulted when determining
      if an exception is a "disconnect" error; ideally, this should remove
      the need for any other inspection of the exception message to detect
      disconnect, however we will leave those existing messages in place
      as a fallback. This should be able to handle newer cases like
      "SSL EOF" conditions. Pull request courtesy Dirk Mueller.
      fixes #3021

    → <<cset ef7b1c4c0a9e>>

  4. Martín Ferrari

    Hi,

    This does not seem to be fixed. I am currently experiencing this same problem with SQLAlchemy 0.9.7 from Ubuntu. I can see that this patch is present in the code, but still the exception is propagated as an OperationalError, and that the application does not recover from this.

    It is a pretty bad problem, I am surprised there is not more people experiencing it.. Maybe nobody uses SSL for the DB connection? (scary...)

  5. Michael Bayer repo owner

    Are you experiencing operationalerror repeatedly for brand new connections? Note that this feature does not prevent the exception from being thrown. It only ensures that the connection pool is dumped, so that subsequent connections are established fresh, rather than attempting to use a pooled connection that we now know to be stale.

    Because the dbapi and sqlalchemy work with transactions, replaying single failed statements is not an option. Your app would have to catch errors it anticipates and fully replay the whole operation which failed.

  6. Martín Ferrari

    Hi Mike,

    I understand that, but in this case the connection is not detection as dead, so it keeps on trying. Once this condition is set, I get the same error for every database operation I attempt, and the only solution for now is to restart the application.

    For more information, it is a Flask application, and I am calling session.close() and session.remove() after each request (it is a scoped session).

  7. Michael Bayer repo owner

    Ok well we're using psycopg2's .closed attribute now. Is that attribute set? If not, it's a psycopg2 bug. I don't have the code in front of me to verify that there's no other possibility on our end.

  8. Martín Ferrari

    I don't really know. I still don't know how to reproduce the bug, as just restarting the database produces a different error which is correctly handled by is_disconnect. Do you have any insight on this?

    If I can reproduce the problem reliably, I can try to debug what's going on and check the attribute.

  9. Martín Ferrari

    I managed to create a similar error, by manually closing the file descriptor of the DB connection: (OperationalError) SSL SYSCALL error: Bad file descriptor

    It triggers the same behaviour, and I can see that psycopg2 is not marking the connection as closed:

    In [38]: s.connection().connection.closed
    Out[38]: 0

    So, I guess I will open a bug with them. Nevertheless, it is still the case that the original problem pointed out by this ticket stays valid.

  10. Martín Ferrari

    Thanks for the pointers, I had arrived at the same just a few minutes ago :-)

    Sadly, those fixes do not seem to cover this problem. I have downloaded 2.5.3 and I can still reproduce the problem. I have opened a new bug report at https://github.com/psycopg/psycopg2/issues/263

    Meanwhile, I will probably monkey-patch sqlalchemy to detect the error strings I am getting, as this is impacting our application in production.

  11. Michael Bayer repo owner

    you dont need a monkeypatch if you're on 0.9. Use the handle_error event:

    @event.listens_for(Engine, "handle_error")
    def handle_exception(context):
        if isinstance(context.original_exception,
            psycopg2.OperationalError) and \
            "SSL disconnect" in str(context.original_exception):
            context.is_disconnect=True
    
  12. Michael Bayer repo owner
    • A revisit to this issue first patched in 0.9.5, apparently
      psycopg2's .closed accessor is not as reliable as we assumed,
      so we have added an explicit check for the exception messages
      "SSL SYSCALL error: Bad file descriptor" and
      "SSL SYSCALL error: EOF detected" when detecting an
      is-disconnect scenario. We will continue to consult psycopg2's
      connection.closed as a first check.
      fixes #3021

    → <<cset b6496ba3d28d>>

  13. Michael Bayer repo owner
    • A revisit to this issue first patched in 0.9.5, apparently
      psycopg2's .closed accessor is not as reliable as we assumed,
      so we have added an explicit check for the exception messages
      "SSL SYSCALL error: Bad file descriptor" and
      "SSL SYSCALL error: EOF detected" when detecting an
      is-disconnect scenario. We will continue to consult psycopg2's
      connection.closed as a first check.
      fixes #3021

    → <<cset 2f4db5307ce0>>

  14. Michael Bayer repo owner

    can you folks please please check this, e.g. @Martín Ferrari etc., it is critical that the people reporting these issues make sure my patches work, as they rely on installation specifics (I don't have a PG SSL setup here to test with). @Andrew Suffield please try the patch this time as well before it gets released thanks!

  15. Andrew Suffield reporter

    I've long since moved on from the job where this happened, sorry. From memory, you can generate the desired behaviour quite easily by setting up SSL and then restarting postgresql in the middle of a connection.

  16. Martín Ferrari

    It took me some effort to produce a test case just for this, but I can confirm that applying this patch solves this specific error. Thanks!!

    Now, I have to say that I think it would be a lot better if this could be solved some other way than checking the text of exceptions.. :/

  17. Michael Bayer repo owner

    OK, like before. Are you just getting the error once, and that's it, next request recovers, or you get it over and over again, connection pool does not get invalidated? See above, this is not about preventing the error, it's about invalidating remaining connections when it occurs. we match exactly the string "SSL SYSCALL error: EOF detected" in the error now.

  18. Log in to comment