The Postgres backend is slow processing the autoload schema queries

Issue #60 resolved
Former user created an issue

Trying out sqlalchemy for the first time, I notice that autoloading the schema from my PostgresQL 8.1 database takes forever: loading a single table takes more than a minute!

With a little investigation, I found that the query executed is

SELECT table_constraints.constraint_name AS table_constraints_constraint_name, 
       table_constraints.constraint_type AS table_constraints_constraint_type, 
       table_constraints.table_name AS table_constraints_table_name, 
       key_column_usage.table_schema AS key_column_usage_table_schema, 
       key_column_usage.table_name AS key_column_usage_table_name, 
       key_column_usage.column_name AS key_column_usage_column_name, 
       key_column_usage.constraint_name AS key_column_usage_constraint_name, 
       constraint_column_usage.table_schema AS constraint_column_usage_table_schema, 
       constraint_column_usage.table_name AS constraint_column_usage_table_name, 
       constraint_column_usage.column_name AS constraint_column_usage_column_name, 
       constraint_column_usage.constraint_name AS constraint_column_usage_constraint_name
FROM information_schema.table_constraints, 
     information_schema.key_column_usage, 
     information_schema.constraint_column_usage
WHERE ((key_column_usage.constraint_name = constraint_column_usage.constraint_name AND 
        constraint_column_usage.constraint_name = table_constraints.constraint_name) AND 
       table_constraints.table_name = %(table_constraints_table_name)s) AND 
       table_constraints.table_schema = %(table_constraints_table_schema)s

Effectively, this query takes lot of time to execute (PgAdmin3 says 85803 ms, and the server is a dual amd64), and the engine is not even able to "explain" it, giving the error

ERROR:  record type has not been registered

So, I managed to produce an equivalent query using JOINs instead, like the following:

SELECT table_constraints.constraint_name AS table_constraints_constraint_name, ...
FROM information_schema.table_constraints 
  JOIN information_schema.constraint_column_usage 
    ON constraint_column_usage.constraint_name = table_constraints.constraint_name 
  JOIN information_schema.key_column_usage 
    ON key_column_usage.constraint_name = constraint_column_usage.constraint_name
WHERE table_constraints.table_name = %(table_constraints_table_name)s 
  AND table_constraints.table_schema = %(table_constraints_table_schema)s

And this is fast: 356 ms!

Comments (19)

  1. Mike Bayer repo owner

    this is very strange. my postgres databases both can compile a few tables slightly slowly, but still within two or three seconds, not a minute. also, the JOIN query you have, unless I am missing something, seems to be the exact same query, except with explicit JOINs. no mention is made in the PG docs of some extra efficiency from one syntax to the other, they in fact say:

    http://www.postgresql.org/docs/8/interactive/sql-select.html

    CROSS JOIN and INNER JOIN produce a simple Cartesian product, the same result as you get from listing the two items at the top level of FROM, but restricted by the join condition (if any). CROSS JOIN is equivalent to INNER JOIN ON (TRUE), that is, no rows are removed by qualification. These join types are just a notational convenience, since they do nothing you couldn't do with plain FROM and WHERE.

    is your initial 1-minute table load time reproducible ? I have noticed postgres behave slowly when it first gets hit after being asleep for a long time.

  2. Former user Account Deleted

    Yes, I do know that the two kinds of query are equivalent, in the sense that both produce the same set of data. It's not a matter of what the query asks, but ''how'' it asks and how ''the engine'' interprets and optimize the query, selecting the best path to get at the result.

    So, to the point. This is the dump of my database, just the schema. Download it in a temporary directory and bzip2 -d it. Download also this script that contains the two queries.

    With the following commands

    dropdb test_alchemy
    psql -f test_alchemy.dump
    psql -f queries.sql -e test_alchemy
    

    I get, respectively, ''81697.857 ms'' and ''395,115 ms''.

    Having several years of experience with various relational engines, this is surely not the first time where using the ''ANSI SQL-92'' syntax gave me huge benefits over old syntax, and I tend to believe that it is evidently clearer both to the human eye and to the engine's SQL compiler that may thus produce a better plan.

  3. Mike Bayer repo owner

    oh absolutely, agreed that query plans change for all sorts of subtle reasons. I will try this out and probably merge it in, also since MySQL isnt using the information_schema for reflection right now i wont worry about supporting that.

    im just surprised that Postgres' query optimizer is so bad :) its probably not a matter of JOIN vs. implicit, but the ordering of the tables in the join (which is easier expressed with JOIN) that is improving its tablescan/indexing behavior.

  4. Mike Bayer repo owner

    OK Ive committed it since it doesnt break anything. but what time do you get when you run:

    python test/engines.py --db postgres
    

    this test takes between 9.5 and 10.5 seconds for me. I just ran it dozens of times with the newer patch and the old one. no difference whatsoever, its a dead heat. i am glad the optimizer makes sense on my machine at least (postgres 8.0.4 ...eek, youre on a newer version and it works more poorly? ). also each query on mine is like half a second or a little more, so that probably corresponds to your .357 time.

  5. Mike Bayer repo owner

    i have observed that running VACUUM FULL on the database helps this problem greatly. im considering switching off of information_schema totally.

  6. Former user Account Deleted

    What affect does the above patch have on your performance tests? Somewhere along the line the join in the query lost the restriction on table_name. This can significantly increase the size of the record set returned.

  7. Former user Account Deleted

    I don't understand where you see a lost restriction, even if I may be wrong assuming that a constraint name is unique within a given schema. At least with Postgres and Firebird, this is enforced.

    Anyway, the attached patch is wrong, as it restricts the query to return only records relative to a single table, and this is not what we want here: it could be good for ''primary keys'' (where obviously the columns belong to the given table) but not for ''foreign keys'' (which refer to a different table).

    Indeed, trying out the patch, the query returns just a single record (for the PK) instead of two (for the PK and a FK).

    So, what makes you think that we are fetching a redundant record set? Why do you think the condition on {{{colconstraint.table_name==constraint.table_name}}} is needed?

  8. Former user Account Deleted

    PostgreSQL's default foreign key names ($1, $2...$n) are not unique to a schema. And actually if schema names are available in the record set they should be joined on as well, because table names may be duplicated in different schemas.

    I'm a little confused by what you are saying about fk constraints. Foreign key constraints refer to a different table but the constrained columns belong to the table being reflected.

    Could it be that you are applying suggested change to a version that you have already altered rather than a pristine svn checkout?

  9. Former user Account Deleted

    No, I tried with and referring to pristine sources, I didn't need any change since this was integrated upstream.

    The point is, both table and schema name are kept in account, already.

    I stand corrected on FK uniqueness, so I'll double check again my statements above. It would anyway help if you could actually produce some more context on your part to support, either proving that current code fails, or that effectively returns or iterate over a "significant larger" result set. If you cannot, I'll try to setup a different kind of database from what I'm used to use (such as ''non unique'' FK names, for example), to test it out... but I'm slightly focused on Firebird databases, lately.

  10. Former user Account Deleted

    I will prepare an small example to illustrate what I found when testing this.

    I've updated the patch in #71 and attached it to #155 and I believe this effectively sidesteps the issue, that's why I wasn't very worried about it.

  11. Mike Bayer repo owner

    ive committed the postgres reflection rewrite in changeset:1810 and postgres reflection seems to be 3x faster. going to try closing this ticket for now (but you can keep discussing information_schema if you feel like it).

  12. Log in to comment