offer option to reattempt on initial connection, not just pre-ping

Issue #3927 new
Xiaojun Lin created an issue

I'm using postgreql with sqlalchemy 1.0.11. Some connection errors occur randomly with low possibility(once 1 or 2 days) while the DB is running normally. The traceback looks like:

2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource   File "/usr/lib/python2.7/site-packages/neutron/db/l3_agentschedulers_db.py", line 428, in get_l3_agents_hosting_routers
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource     l3_agents = [binding.l3_agent for binding in query]
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2736, in __iter__
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource     return self._execute_and_instances(context)
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2749, in _execute_and_instances
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource     close_with_result=True)
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2740, in _connection_from_session
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource     **kw)
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 905, in connection
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource     execution_options=execution_options)
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 912, in _connection_for_bind
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource     conn = engine.contextual_connect(**kw)
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2039, in contextual_connect
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource     self._wrap_pool_connect(self.pool.connect, None),
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2078, in _wrap_pool_connect
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource     e, dialect, self)
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1401, in _handle_dbapi_exception_noconnection
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource     util.raise_from_cause(newraise, exc_info)
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 200, in raise_from_cause
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource     reraise(type(exception), exception, tb=exc_tb)
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2074, in _wrap_pool_connect
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource     return fn()
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 376, in connect
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource     return _ConnectionFairy._checkout(self)
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 713, in _checkout
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource     fairy = _ConnectionRecord.checkout(pool)
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 485, in checkout
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource     rec.checkin()
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource     compat.reraise(exc_type, exc_value, exc_tb)
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 482, in checkout
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource     dbapi_connection = rec.get_connection()
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 594, in get_connection
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource     self.connection = self.__connect()
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 607, in __connect
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource     connection = self.__pool._invoke_creator(self)
2017-02-07 15:49:41.843 147684 ERROR neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/strategies.py", li
2017-02-07T15:49:41.889+08:00 localhost neutron-server ERROR [pid:147684] [MainThread][tid:92991152] [req-d70f28f9-7fa7-46d3-bc9a-bace54158571] [resource.py:131 resource] (client error): (psycopg2.OperationalError) SSL connection has been closed unexpectedly

I've been looking into this problem for a long time but was unable to find a possible cause from the DB side, so I tried to add some retries on sqlalchemy side when connection errors occur.

The code may look like this: before:

def __connect(self):
        try:
            self.starttime = time.time()
            connection = self.__pool._invoke_creator(self)
            self.__pool.logger.debug("Created new connection %r", connection)
            return connection
        except Exception as e:
            self.__pool.logger.debug("Error on connect(): %s", e)
            raise

after:

def __connect(self, try_times=3):
        tmp_exc = Exception("try_times is zero")
        for i in range(try_times):
            try:
                self.starttime = time.time()
                connection = self.__pool._invoke_creator(self)
                self.__pool.logger.debug("Created new connection %r", connection)
                return connection
            except Exception as e:
                self.__pool.logger.debug("Error on connect(): %s", e)
                tmp_exc = e
        raise tmp_exc

Here my problem is:

  1. Does anyone have any idea about what the "SSL connection has been closed unexpectedly" error might be caused by?
  2. Is the "retry" way a proper solution to this? If not, what's the better suggestion?

Comments (4)

  1. Mike Bayer repo owner

    Does anyone have any idea about what the "SSL connection has been closed unexpectedly" error might be caused by?

    you'd need to email on the psycopg2 list.

    Is the "retry" way a proper solution to this? If not, what's the better suggestion?

    There is the "creator" hook: http://docs.sqlalchemy.org/en/latest/core/pooling.html#using-a-custom-connection-function as well as the dialect hook which will give you the DBAPI arguments too: http://docs.sqlalchemy.org/en/latest/core/events.html?highlight=do_connect#sqlalchemy.events.DialectEvents.do_connect . use one of these hooks to retry connecting.

  2. Log in to comment