integrate "pre-ping" into connection pool
This issue has been observed in Mailman 3 with
- postgresql 9.3
- psycopg2 2.6.2
- SQLAlchemy 1.1.5.
There are several persistent runner processes. If the postgresql server is restarted without restarting the Mailman runners, each runner encounters an error similar to the following traceback upon its next database access which may be at a time significantly after the postgresql restart.
Feb 22 22:51:13 2017 (30637) LMTP message parsing
Traceback (most recent call last):
File "/opt/mailman/mailman-bundler/venv-3.4/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
context)
File "/opt/mailman/mailman-bundler/venv-3.4/lib/python3.4/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
cursor.execute(statement, parameters)
psycopg2.OperationalError: terminating connection due to administrator command
SSL connection has been closed unexpectedly
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/mailman/mailman-bundler/venv-3.4/lib/python3.4/site-packages/mailman-3.1.0b4-py3.4.egg/mailman/runners/lmtp.py", line 133, in process_message
listnames = set(getUtility(IListManager).names)
File "/opt/mailman/mailman-bundler/venv-3.4/lib/python3.4/site-packages/mailman-3.1.0b4-py3.4.egg/mailman/model/listmanager.py", line 107, in names
MailingList.list_name):
File "/opt/mailman/mailman-bundler/venv-3.4/lib/python3.4/site-packages/sqlalchemy/orm/query.py", line 1202, in values
return iter(q)
File "/opt/mailman/mailman-bundler/venv-3.4/lib/python3.4/site-packages/sqlalchemy/orm/query.py", line 2790, in __iter__
return self._execute_and_instances(context)
File "/opt/mailman/mailman-bundler/venv-3.4/lib/python3.4/site-packages/sqlalchemy/orm/query.py", line 2813, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "/opt/mailman/mailman-bundler/venv-3.4/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 945, in execute
return meth(self, multiparams, params)
File "/opt/mailman/mailman-bundler/venv-3.4/lib/python3.4/site-packages/sqlalchemy/sql/elements.py", line 263, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/opt/mailman/mailman-bundler/venv-3.4/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1053, in _execute_clauseelement
compiled_sql, distilled_params
File "/opt/mailman/mailman-bundler/venv-3.4/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context
context)
File "/opt/mailman/mailman-bundler/venv-3.4/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1393, in _handle_dbapi_exception
exc_info
File "/opt/mailman/mailman-bundler/venv-3.4/lib/python3.4/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/opt/mailman/mailman-bundler/venv-3.4/lib/python3.4/site-packages/sqlalchemy/util/compat.py", line 186, in reraise
raise value.with_traceback(tb)
File "/opt/mailman/mailman-bundler/venv-3.4/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
context)
File "/opt/mailman/mailman-bundler/venv-3.4/lib/python3.4/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) terminating connection due to administrator command
SSL connection has been closed unexpectedly
[SQL: 'SELECT mailinglist.mail_host AS mailinglist_mail_host, mailinglist.list_name AS mailinglist_list_name \nFROM mailinglist']
This appears similar to https://bitbucket.org/zzzeek/sqlalchemy/issues/2570 but that is old and resolved.
Comments (17)
-
repo owner -
reporter Thanks for the reply. I'll look at the referenced docs.
-
repo owner I should admit that there is a theoretical way that "ping" approach could be integrated automatically, treating the first statement invoked in the transaction as the "ping" and replaying it. Replaying a statement that's already been invoked is a "big deal" though, as there's all kinds of events and defaults associated with an arbitrary execution and it's not clear if people would want those events/defaults replayed as well. It's one of those cases where making this automatic is 90% possible but would potentially be a total mess in 10% of the cases.
-
repo owner in fact in some cases the defaults for an INSERT do emit their own SQL so that would move the "replay" up to include all the events and defaults. having a "ping" that's distinct from the actual "first statement" is a lot easier, but perhaps this could be integrated into dialects (e.g. "ping").
-
For reference to the issue in Mailman: https://gitlab.com/mailman/mailman/issues/313
-
Thanks for the documentation references. I'm not sure the optimistic approach will work for us. I'm worried about the performance impact of the pessimistic approach, or disabling pooling. We should at least document this as a known issue in Mailman 3.1, recommending that if the database server is restarted, Mailman should be restarted.
-
repo owner @warsaw I have a feeling the "ping" approach is not that big a deal, keep in mind this is a per-transaction ping, not per statement. Assuming mailman does a bunch of SQL statements within one connection pool checkout, that is (which is the idiomatic style these days). All of openstack is using the aforementioned "ping" recipe. I'm starting to consider integrating a low-Python-overhead "ping" option into create_engine() at this point because it's true, "optimistic" means the application has to weather an exception which means having to retry an operation.
-
repo owner - changed title to integrate "pre-ping" into connection pool
- changed milestone to 1.2
- changed component to pool
- marked as enhancement
https://gerrit.sqlalchemy.org/320
one reason I think this might be great is that we don't necessarily have to emit "SELECT 1" for every DBAPI. There are probably even lower latency methods in specific DBAPIs that test if the socket is alive.
-
Very cool. @msapiro Do you want to experiment with the ping approach in Mailman 3?
-
reporter @warsaw I'll experiment and see what I can come up with. I'll also monitor https://gerrit.sqlalchemy.org/320 and try integrating that when it stabilizes.
-
repo owner - changed status to resolved
Integrate "pre-ping" into connection pool.
Added native "pessimistic disconnection" handling to the :class:
.Pool
object. The new parameter :paramref:.Pool.pre_ping
, available from the engine as :paramref:.create_engine.pool_pre_ping
, applies an efficient form of the "pre-ping" recipe featured in the pooling documentation, which upon each connection check out, emits a simple statement, typically "SELECT 1", to test the connection for liveness. If the existing connection is no longer able to respond to commands, the connection is transparently recycled, and all other connections made prior to the current timestamp are invalidated.Change-Id: I89700d0075e60abd2250e54b9bd14daf03c71c00 Fixes:
#3919→ <<cset f881dae8179b>>
-
reporter I have tested
→ <<cset f881dae8179b>>
In my environment in Mailman 3 with
pool_pre_ping=True
added to thecreate_engine()
call and once I reread http://docs.sqlalchemy.org/en/latest/core/pooling.html#disconnect-handling-pessimistic including the noteIt is critical to note that the pre-ping approach does not accommodate for connections dropped in the middle of transactions or other SQL operations.
and adjusted my expectations and testing accordingly, I can say it's working as advertised. I was only concerned with connections that were lost during potentially long periods between transactions in the first place, so this fix is good.Many thanks to @zzzeek for implementing this.
-
repo owner great thanks for testing. early versions of SQLAlchemy were oriented around a "checkout for every statement" model, which is very obsolete so it's taken me some years to realize a simple ping at the top is not a big deal.
-
Maybe this new feature needs to be documented here :
-
repo owner good idea
-
repo owner that document is all wrong anyway, the server kills the connection. the non-threadsafe use doesn't really cause that message anymore.
-
repo owner I pushed up some rewrites to both thsoe secvtions for master /1.1 /1.0
- Log in to comment
this is by design - SQLAlchemy core pools connections by default and does not "pre-ping" the connection to see if it's alive.
See the documentation at http://docs.sqlalchemy.org/en/latest/core/pooling.html#dealing-with-disconnects which discusses both "optimistic" and "pessimistic" approaches to dealing with disconnected connections inside the connection pool. "pessimistic" is simpler to add to an existing application but adds a small performance hit. Another option is to disable pooling entirely (see http://docs.sqlalchemy.org/en/latest/core/pooling.html#switching-pool-implementations).