Autoloading table from database causes ~1 minute delay when shutting down PostgreSQL
When I connect to a PostgreSQL database and autoload a table, and do nothing else, shutting down PostgreSQL hangs for about a minute, unless I call engine.dispose() before doing so.
I'm not sure if this is a bug or something that I missed in the documentation. I only found out yesterday by trial and error how to solve this, and it wasn't clear from me from the docs that I needed to do this.
My environment:
- Windows 7 Professional SP1
- Python 2.7.8 64 bit
- PostgreSQL 9.3.3
- SQLAlchemy==1.0.5
- psycopg2==2.6.1
(Also happens with:
- SQLAlchemy==0.9.3
- psycopg2==2.5.3)
Repro code:
import logging
from sqlalchemy import *
logger = logging.getLogger(__name__)
def repro(_host, _port, _username, _password, _dbname, _table_name):
"""Reproduce slow PostgreSQL shutdown issue.
Keyword arguments:
_host -- The host address of the PostgreSQL database server.
_port -- The port used by the PostgreSQL database server.
_username -- The username to be used with the PostgreSQL database server.
_password -- The password for the PostgreSQL database server.
_dbname -- The name of the database in the PostgreSQL database server.
_table_name -- The name of any table in the database.
"""
# Maximum logging.
# See http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html#configuring-logging.
# "set to logging.INFO for SQL query output"
logging.getLogger("sqlalchemy.engine").setLevel(logging.INFO)
# "set to logging.INFO or lower to log connection pool checkouts/checkins."
logging.getLogger("sqlalchemy.pool").setLevel(logging.INFO)
# "set to logging.INFO for information on mapper configurations."
logging.getLogger("sqlalchemy.orm").setLevel(logging.INFO)
# See http://docs.sqlalchemy.org/en/rel_0_9/dialects/postgresql.html.
# "The psycopg2 dialect will log Postgresql NOTICE messages via the
# sqlalchemy.dialects.postgresql logger"
logging.getLogger("sqlalchemy.dialects.postgresql").setLevel(logging.INFO)
# Create engine for the indended database.
engine = _create_engine(_host, _port, _username, _password, _dbname, True)
# Load tables from the database.
metadata = MetaData(bind=engine)
a_table = Table(_table_name, metadata, autoload=True)
# With the following line disabled, shutting down PostgreSQL will take about 1 minute longer.
# engine.dispose()
def _build_database_URI(_host, _port, _username, _password, _dbname):
return "postgresql+psycopg2://{user}:{password}@{host}:{port}/{dbname}".\
format(user=_username, password=_password, host=_host, port=_port, dbname=_dbname)
def _create_engine(_host, _port, _username, _password, _dbname, _debug):
return create_engine(_build_database_URI(_host, _port, _username, _password, _dbname), echo=_debug, echo_pool=_debug)
if __name__ == "__main__":
# Set up logging.
console_handler = logging.StreamHandler()
console_handler.setLevel(logging.INFO)
console_handler.setFormatter(logging.Formatter('%(message)s'))
root_logger = logging.getLogger("")
root_logger.setLevel(logging.DEBUG)
root_logger.addHandler(console_handler)
logger.info("Connecting to the database...")
repro(
_host = "localhost",
_port = 5439,
_username = "<user>",
_password = "<password>",
_dbname = "<db name>",
_table_name ="<any table name>"
)
logger.info("Connected to the database. Try stopping the PostgreSQL server.")
raw_input("Press enter to stop...")
Repro steps:
- Start PostgreSQL server.
- Start Python code above.
- Shut down PostgreSQL server.
Stopping the PostgreSQL server takes about a minute (and may fail).
Enable the engine.dispose() line, repeat, shutting down PostgreSQL takes under a second.
My use case is unusual: I have a Windows executable that starts and stops its own local PostgreSQL server.
I can sorta see in retrospect that the autoloading does something that stops the shutdown from proceeding smoothly, but overall this behavior was surprising to me.
The documentation of engine.dispose() does not indicate to me that this is something I need to call to enable a smooth PostgreSQL shutdown. I also couldn't find other obvious "here's how to shut down bits of SQLAlchemy" advice in the documentation.
Once I had more or less accidentally reduced the problem to these few lines, I found this solution by going through the APIs of Engine, Metadata, and Table, and trying to call everything that sounded like it should be called during shutdown.
Comments (13)
-
repo owner -
reporter Yes, fine :) I'd fix that by adding a conn.close().
And I just stumbled over this bit in the docs:
For the connection pool to properly manage connections, connections should be returned to the connection pool (i.e. connection.close()) whenever the connection is not in use.
That seems relevant.
But where in my code am I creating a connection? I don't call connect(), I have no Connection object. What would the proper way to shut things down be, in my code?
(FWIW I have a lot of code using engine.raw_connection() but I try to be super-careful to close all cursors and connections there. And I use scoped sessions in my other code.)
-
repo owner - changed status to resolved
- add explciit section on engine disposal, fixes
#3461
→ <<cset 3d78705cf498>>
-
repo owner - add explciit section on engine disposal, fixes
#3461
(cherry picked from commit 3d78705cf4981e460d6d1b5bb08870286fc3fe93)
→ <<cset c4c8a0d24ca8>>
- add explciit section on engine disposal, fixes
-
repo owner read the new section i just checked in. engine.dispose() is very understated in the docs because i dont want people creating/disposing engines like they are connections. that antipattern is very common.
-
reporter OK, so you're saying I should delete the engine (as in, set the variable to None) instead of calling engine.dispose(), right?
But the problem I had is, based on my current understanding, is that a connection was implicitly created and not closed, and that I need a way to close that connection in order to cleanly shut down my application. Is deleting the engine the best way to do that? (Especially since the docs now specifically say garbage-collecting the engine does not guarantee closing the connections.)
I realize I'm probably a special case because the same Python instance shuts down SQLAlchemy and the PostgreSQL server which means various objects might be alive when the server is shut down.
-
repo owner OK, so you're saying I should delete the engine (as in, set the variable to None) instead of calling engine.dispose(), right?
that would indicate I wrote the docs all wrong. you should call dispose(). relying on garbage collection to close database connections is never a good idea. can you suggest how you came to that conclusion because I should reword the docs.
Is deleting the engine the best way to do that?
no, using dispose(), or maybe in your case just using a NullPool which doesnt actually pool anything beyond the scope of using a connection might be best.
I realize I'm probably a special case because the same Python instance shuts down SQLAlchemy and the PostgreSQL server
if you don't need blazing speed you might want to just use NullPool
-
repo owner - fix the bullets
- tone down the "never dispose an engine" language
- refer to NullPool for the "I don't like pooling" use case
references
#3461
→ <<cset a1b6e9f324ee>>
-
repo owner - fix the bullets
- tone down the "never dispose an engine" language
- refer to NullPool for the "I don't like pooling" use case
references
#3461
(cherry picked from commit a1b6e9f324eec04ff69f2ac5347ad3df5b931dd5)
→ <<cset 88077496ad03>>
-
reporter My reasoning, based on my limited knowledge of SQLAlchemy, SQL, etc. was: - I use engine.dispose because it worked but I didn't understand why. - You explained (or I inferred) that the problem is there is a connection somewhere. - You said using engine.dispose() is an anti-pattern and expanded the documentation warning people against that. - Therefore I assumed that engine.dispose() must be something esoteric I should never call, there must be some other solution to close that connection, and deleting the engine seemed the only other way.
I was more focused on why this connection exists in the first place (creating a table with autoload opens a connection that is never closed?), and where is it / how can I close it.
I do like pooling and I do need blazing speed :)
-
repo owner call dispose(). I mean, if you are opening and deleting engines on every request, then yeah that's an antipattern. But if you require that the DB shut down while that's running, you need to dispose it before the DB shuts down at least.
-
repo owner - more edits, references
#3461
→ <<cset 7aa2100db3e6>>
- more edits, references
-
repo owner - more edits, references
#3461
(cherry picked from commit 7aa2100db3e6f768a280b4dfdb675d6709f94625)
→ <<cset 2cc12508be6f>>
- more edits, references
- Log in to comment
OK, now I'm going to blow your mind. Here's the same thing without SQLAlchemy:
Same result. Is it obvious how this code would be fixed? How come?