Autoloading table from database causes ~1 minute delay when shutting down PostgreSQL

Issue #3461 resolved
Jurie Horneman created an issue

When I connect to a PostgreSQL database and autoload a table, and do nothing else, shutting down PostgreSQL hangs for about a minute, unless I call engine.dispose() before doing so.

I'm not sure if this is a bug or something that I missed in the documentation. I only found out yesterday by trial and error how to solve this, and it wasn't clear from me from the docs that I needed to do this.

My environment:

  • Windows 7 Professional SP1
  • Python 2.7.8 64 bit
  • PostgreSQL 9.3.3
  • SQLAlchemy==1.0.5
  • psycopg2==2.6.1

(Also happens with:

  • SQLAlchemy==0.9.3
  • psycopg2==2.5.3)

Repro code:

import logging
from sqlalchemy import *


logger = logging.getLogger(__name__)


def repro(_host, _port, _username, _password, _dbname, _table_name):
    """Reproduce slow PostgreSQL shutdown issue.

    Keyword arguments:
    _host -- The host address of the PostgreSQL database server.
    _port -- The port used by the PostgreSQL database server.
    _username -- The username to be used with the PostgreSQL database server.
    _password -- The password for the PostgreSQL database server.
    _dbname -- The name of the database in the PostgreSQL database server.
    _table_name -- The name of any table in the database.
    """

    # Maximum logging.

    # See http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html#configuring-logging.
    # "set to logging.INFO for SQL query output"
    logging.getLogger("sqlalchemy.engine").setLevel(logging.INFO)
    # "set to logging.INFO or lower to log connection pool checkouts/checkins."
    logging.getLogger("sqlalchemy.pool").setLevel(logging.INFO)
    # "set to logging.INFO for information on mapper configurations."
    logging.getLogger("sqlalchemy.orm").setLevel(logging.INFO)
    # See http://docs.sqlalchemy.org/en/rel_0_9/dialects/postgresql.html.
    # "The psycopg2 dialect will log Postgresql NOTICE messages via the
    #  sqlalchemy.dialects.postgresql logger"
    logging.getLogger("sqlalchemy.dialects.postgresql").setLevel(logging.INFO)

    # Create engine for the indended database.
    engine = _create_engine(_host, _port, _username, _password, _dbname, True)

    # Load tables from the database.
    metadata = MetaData(bind=engine)
    a_table = Table(_table_name, metadata, autoload=True)

    # With the following line disabled, shutting down PostgreSQL will take about 1 minute longer.
    # engine.dispose()


def _build_database_URI(_host, _port, _username, _password, _dbname):
    return "postgresql+psycopg2://{user}:{password}@{host}:{port}/{dbname}".\
        format(user=_username, password=_password, host=_host, port=_port, dbname=_dbname)

def _create_engine(_host, _port, _username, _password, _dbname, _debug):
    return create_engine(_build_database_URI(_host, _port, _username, _password, _dbname), echo=_debug, echo_pool=_debug)


if __name__ == "__main__":
    # Set up logging.
    console_handler = logging.StreamHandler()
    console_handler.setLevel(logging.INFO)
    console_handler.setFormatter(logging.Formatter('%(message)s'))

    root_logger = logging.getLogger("")
    root_logger.setLevel(logging.DEBUG)
    root_logger.addHandler(console_handler)

    logger.info("Connecting to the database...")
    repro(
        _host = "localhost",
        _port = 5439,
        _username = "<user>",
        _password = "<password>",
        _dbname = "<db name>",
        _table_name ="<any table name>"        
    )

    logger.info("Connected to the database. Try stopping the PostgreSQL server.")

    raw_input("Press enter to stop...")

Repro steps:

  • Start PostgreSQL server.
  • Start Python code above.
  • Shut down PostgreSQL server.

Stopping the PostgreSQL server takes about a minute (and may fail).

Enable the engine.dispose() line, repeat, shutting down PostgreSQL takes under a second.

My use case is unusual: I have a Windows executable that starts and stops its own local PostgreSQL server.

I can sorta see in retrospect that the autoloading does something that stops the shutdown from proceeding smoothly, but overall this behavior was surprising to me.

The documentation of engine.dispose() does not indicate to me that this is something I need to call to enable a smooth PostgreSQL shutdown. I also couldn't find other obvious "here's how to shut down bits of SQLAlchemy" advice in the documentation.

Once I had more or less accidentally reduced the problem to these few lines, I found this solution by going through the APIs of Engine, Metadata, and Table, and trying to call everything that sounded like it should be called during shutdown.

Comments (13)

  1. Mike Bayer repo owner

    OK, now I'm going to blow your mind. Here's the same thing without SQLAlchemy:

    import psycopg2
    import logging
    
    logger = logging.getLogger(__name__)
    
    
    def repro(_host, _port, _username, _password, _dbname, _table_name):
        """Reproduce slow PostgreSQL shutdown issue.
    
        """
    
        connections = {}
    
        conn = psycopg2.connect(
            database=_dbname, user=_username, password=_password,
            host=_host, port=_port)
    
        connections['foo'] = (conn, connections)
    
        cursor = conn.cursor()
        cursor.execute("""
            select relname from pg_class c join pg_namespace n on
            n.oid=c.relnamespace where
            pg_catalog.pg_table_is_visible(c.oid)
            and relname=%(name)s
        """, {"name": _table_name})
        cursor.close()
    
        conn.rollback()
    
    if __name__ == "__main__":
        # Set up logging.
        console_handler = logging.StreamHandler()
        console_handler.setLevel(logging.INFO)
        console_handler.setFormatter(logging.Formatter('%(message)s'))
    
        root_logger = logging.getLogger("")
        root_logger.setLevel(logging.DEBUG)
        root_logger.addHandler(console_handler)
    
        logger.info("Connecting to the database...")
        repro(
            _host="localhost",
            _port=None,
            _username="scott",
            _password="tiger",
            _dbname="test",
            _table_name="foo"
        )
    
        logger.info("Connected to the database. Try stopping the PostgreSQL server.")
    
        raw_input("Press enter to stop...")
    

    Same result. Is it obvious how this code would be fixed? How come?

  2. Jurie Horneman reporter

    Yes, fine :) I'd fix that by adding a conn.close().

    And I just stumbled over this bit in the docs:

    For the connection pool to properly manage connections, connections should be returned to the connection pool (i.e. connection.close()) whenever the connection is not in use.

    That seems relevant.

    But where in my code am I creating a connection? I don't call connect(), I have no Connection object. What would the proper way to shut things down be, in my code?

    (FWIW I have a lot of code using engine.raw_connection() but I try to be super-careful to close all cursors and connections there. And I use scoped sessions in my other code.)

  3. Mike Bayer repo owner

    read the new section i just checked in. engine.dispose() is very understated in the docs because i dont want people creating/disposing engines like they are connections. that antipattern is very common.

  4. Jurie Horneman reporter

    OK, so you're saying I should delete the engine (as in, set the variable to None) instead of calling engine.dispose(), right?

    But the problem I had is, based on my current understanding, is that a connection was implicitly created and not closed, and that I need a way to close that connection in order to cleanly shut down my application. Is deleting the engine the best way to do that? (Especially since the docs now specifically say garbage-collecting the engine does not guarantee closing the connections.)

    I realize I'm probably a special case because the same Python instance shuts down SQLAlchemy and the PostgreSQL server which means various objects might be alive when the server is shut down.

  5. Mike Bayer repo owner

    OK, so you're saying I should delete the engine (as in, set the variable to None) instead of calling engine.dispose(), right?

    that would indicate I wrote the docs all wrong. you should call dispose(). relying on garbage collection to close database connections is never a good idea. can you suggest how you came to that conclusion because I should reword the docs.

    Is deleting the engine the best way to do that?

    no, using dispose(), or maybe in your case just using a NullPool which doesnt actually pool anything beyond the scope of using a connection might be best.

    I realize I'm probably a special case because the same Python instance shuts down SQLAlchemy and the PostgreSQL server

    if you don't need blazing speed you might want to just use NullPool

  6. Jurie Horneman reporter

    My reasoning, based on my limited knowledge of SQLAlchemy, SQL, etc. was: - I use engine.dispose because it worked but I didn't understand why. - You explained (or I inferred) that the problem is there is a connection somewhere. - You said using engine.dispose() is an anti-pattern and expanded the documentation warning people against that. - Therefore I assumed that engine.dispose() must be something esoteric I should never call, there must be some other solution to close that connection, and deleting the engine seemed the only other way.

    I was more focused on why this connection exists in the first place (creating a table with autoload opens a connection that is never closed?), and where is it / how can I close it.

    I do like pooling and I do need blazing speed :)

  7. Mike Bayer repo owner

    call dispose(). I mean, if you are opening and deleting engines on every request, then yeah that's an antipattern. But if you require that the DB shut down while that's running, you need to dispose it before the DB shuts down at least.

  8. Log in to comment