add UniqueConstraint to inspector.reflecttable / Table(... autoload=True), including dedupe logic for indexes

Issue #3184 resolved
Johannes Erdfelt created an issue

sqlite treats unique indexes and unique constraints as distinct entities. However, only indexes are reflected and unique constraints aren't.

This is different than mysql (which aliases unique constraints to a unique index) and postgresql (which has distinct unique indexes and unique constraints).

The attached script shows the problem.

Comments (19)

  1. Johannes Erdfelt reporter

    I spent a little bit of trying to fix this bug, but the behavior for reflection confuses me a little bit so I'm not sure what the correct fix is.

    The reflection engine has a method called get_indexes() and a method called get_unique_constraints().

    The behavior of the calls to get_indexes() and get_unique_constraints() is different between MySQL/PostgreSQL and SQLite.

    Both MySQL and PostgreSQL will return unique indexes and unique constraints via both get_indexes() and get_unique_constraints(). (This is a little confusing, but probably a different issue)

    SQLite only returns unique indexes via get_indexes() and unique constraints via get_unique_constraints().

    The reason this problem is seen for SQLite is that only get_indexes() is called via the reflect() method. get_unique_constraints() is not called at all. As a result, only the unique indexes are reflected via the reflect() method.

    It's not clear to me if the problem is ultimately that reflect() doesn't call get_unique_constraints() and thusly MySQL and PostgreSQL drivers should be fixed or if the SQLite driver should be fixed to return unique constraints as part of the get_indexes() method (and unique indexes as part of the get_unique_constraints() method).

    I'm leaning towards the former since it confuses me a little bit that get_indexes() and get_unique_constraints() can overlap.

    I should also mention that I have only tested MySQL, PostgreSQL and SQLite. I don't know what other drivers do.

  2. Mike Bayer repo owner

    unique constraints are not currently part of the reflection process when one calls Table(... autoload=True). that's the behavior for all backends, this logic is in engine/reflection.py.

    That lots of databases have a fuzzy line between whats a unique index and whats a unique constraint (including that some create an implicit index for constraints, so there's really two constructs) is not really dealt with, so you might see Index objects in a reflected table that have some relationship to a unique constraint, but that is database-specific, not on the SQLAlchemy side, and those are still always going to be Index objects, never UniqueConstraint objects.

    On no backend would you see a UniqueConstraint() object associated with a Table that you got via autoload=True. The get_unique_constraints method is not wired into reflecttable at this time.

    So lets fix your test to actually test "unique constraint" reflection:

    insp = inspect(engine)
    print insp.get_unique_constraints("test_reflection")
    

    we can see that with the exception of MySQL's goofy addition, we are getting the same thing for all three backends:

    Connecting to mysql://scott:tiger@localhost/test
    [{'column_names': [u's'], 'name': u'test_uc_1'}, {'column_names': [u'r'], 'name': u'test_ui_1'}]
    
    Connecting to postgresql://scott:tiger@localhost/test
    [{'column_names': [u's'], 'name': u'test_uc_1'}]
    
    Connecting to sqlite:///test.sqlite
    [{'column_names': [u's'], 'name': u'test_uc_1'}]
    

    So the issue as described here "sqlite does not reflect unique constraints" is not accurate. This is really just a request to include UniqueConstraint in the Table-level reflection process which is not backend-specific. This is fine for 1.0, its just a few lines, if you can send me a PR with tests that would be very helpful.

  3. Mike Bayer repo owner

    re: overlap, well yes, the overlap situation is a nightmare - there is different overlap behavior on PG versus MySQL. It is mostly an issue in Alembic, and we've gone through at least four releases to continue to iron out edge cases specific to Alembic's emphasis on not generating nonexistent migrations.

    Perhaps we can at some point generalize Alembic's "deduplication" logic and make it available as an option to Table reflection, though that logic can only go so far; it has to guess.

  4. Mike Bayer repo owner

    I think for the feature here, we shoudl at least have distinct documentation sections in the MySQL and Postgresql dialect docs that give users a heads up as to the duplication behavior of each backend.

  5. Johannes Erdfelt reporter

    The behavior for MySQL is confusing to me because MySQL makes no distinction between unique indexes and unique constraints. Unique constraints are normalized to a unique index. Why does SQLAlchemy synthesize a unique constraint in that case (when calling get_unique_constraints() directly)?

    The behavior for PostgreSQL is confusing to me because it does not create an index for the unique constraint. The dumped schema shows only a unique constraint and no index. Why does SQLAlchemy synthesize a unique index in that case?

    The behavior for SQLite is confusing to me because it's effectively the same internal model as PostgreSQL. Unique indexes and unique constraints and different and neither implicitly creates an index for a unique constraint (as far as I can tell from inspecting the schema dumps). Why does SQLAlchemy behave differently for PostgreSQL and SQLite?

    I'll attach the schema dumps for PostgreSQL and SQLite.

  6. Mike Bayer repo owner

    SQLAlchemy doesn't "synthesize" anything. We report on exactly what the DB tells us is present. the problems here are twofold: on MySQL, there is not a one-to-one relationship between the "CREATE INDEX" and the "UNIQUE" constructs that is present on other databases. On MySQL they are synonymous:

    mysql> create table test (a integer, b integer, unique (b));
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> create unique index idx_test_a on test(a);
    
    mysql> show create table test;
    | test  | CREATE TABLE `test` (
      `a` int(11) DEFAULT NULL,
      `b` int(11) DEFAULT NULL,
      UNIQUE KEY `b` (`b`),
      UNIQUE KEY `idx_test_a` (`a`)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
    

    So above, we had to make a choice. Do we say that a MySQL table has only unique indexes, and unique constraints don't exist, even though the user said, "UniqueConstraint()"? Or do we say that a MySQL table doesn't have unique indexes, only unique constraints, even though the user said, "Index(unique=True)"? Or do we just say, hey yes, from MySQL's perspective, this table has a UniqueConstraint and an Index(unique=True), and we have no way at all to know which one it was originally. All three options are a guess, so at least the last guess doesn't try to hide anything.

    On Postgresql, the situation is different. Postgresql actually creates an INDEX for the UNIQUE constraint. So yes, you get two separate constructs for the price of one:

    test=> create table test (a integer, b integer, unique(b));
    CREATE TABLE
    test=> create unique index test_idx_a on test(a);
    CREATE INDEX
    test=> SELECT conname FROM pg_constraint WHERE conrelid = (SELECT oid FROM pg_class
        WHERE relname LIKE 'test');
      conname   
    ------------
     test_b_key
    (1 row)
    
    test=> \di test*
                  List of relations
     Schema |    Name    | Type  | Owner | Table 
    --------+------------+-------+-------+-------
     public | test_b_key | index | scott | test
     public | test_idx_a | index | scott | test
    (2 rows)
    

    so again. This table has two indexes, and a unique constraint. Are we to try to guess that one of those indexes is really a side effect of one of the constraints we specified? Would we want to see that in a simple tool like PgAdmin? Or would we want to see exactly what the database tells us?

    This is part of the reason there's been no hurry to put UniqueConstraint into the Table def on reflection. The database backends don't really give us a picture that matches how the "CREATE TABLE" was set up in the first place, that information is somewhat lost.

  7. Mike Bayer repo owner

    Which anyway is where we get back into Alembic's logic. Alembic actually does try to guess that same-named indexes and unique constraints might be the same thing. It's comparing to hand-rendered metadata so it has a lot of problems here, but as far as this enahncement, if we are to actually produce the UniqueConstraint() object present with Table metadata, e.g. an operation where we do have the chance to reconcile Index and UniqueConstraint, maybe this deduping logic should be present on a per-dialect basis as part of this feature. Alembic's dedupe logic is more complicated than we'd need here but you can see that at https://bitbucket.org/zzzeek/alembic/src/99749650416d096a67e6286b5ea91a67f293a60d/alembic/autogenerate/compare.py?at=master#cl-205.

  8. Johannes Erdfelt reporter

    Doing a pg_dump --schema-only (see the attachment I made) shows no implicit index.

    Maybe postgresql creates an index internally, but it does a good job of not exposing that to the user.

    It looks like the problem with the postgresql driver is that SQLAlchemy is exposing internal implementation details back to the user.

    Also, SQLAlchemy must be doing some filtering of the indexes returned via the "\di" command since my table shows an index for the primary key in my table but SQLAlchemy somehow figures out that it should filter that out of the returned indexes.

    I guess it's time to dig into the postgresql driver. Unless you still think that all of this inconsistency is not a bug then I'll go figure out another solution.

  9. Mike Bayer repo owner

    Maybe postgresql creates an index internally, but it does a good job of not exposing that to the user.

    just type "/di" at your psql prompt. It is exposed right there. Their dump tool does a dedupe, that's great. We view the system tables directly, and they report it, as does their "/di" command.

    Also, SQLAlchemy must be doing some filtering of the indexes returned via the "\di" command since my table shows an index for the primary key in my table but SQLAlchemy somehow figures out that it should filter that out of the returned indexes.

    SQLAlchemy doesnt actually use "\di". It distinguishes the PRIMARY entry from others because that is not an index, that's the primary key constraint.

    I guess it's time to dig into the postgresql driver. Unless you still think that all of this inconsistency is not a bug then I'll go figure out another solution.

    The get_indexes() and get_unique_constraints() methods can attempt to report on potential duplicates as part of their output, and then the Table() reflection part of this can make use of that duplicate info in order to produce a Table() object that doesn't have these elements repeated in terms of Index and UniqueConstraint separately.

  10. Johannes Erdfelt reporter

    If the reflection code sees a duplicate index and unique constraint. Which one does it drop?

  11. Mike Bayer repo owner

    Ok more serious answer. in the case of Postgresql, if we see the index and the unique constraint with the same name, we would keep the unique constraint - because i dont think creating the unique index implies the constraint (e.g. the other direction) (does it?)

    for MySQL, I think it's just a guess. We should there stick with the Index as well since that's what it does now.

  12. Mike Bayer repo owner
    • use provide_metadata for new unique constraint / index tests
    • add a test for PG reflection of unique index without any unique constraint
    • for PG, don't include 'duplicates_constraint' in the entry if the index does not actually mirror a constraint
    • use a distinct method for unique constraint reflection within table
    • catch unique constraint not implemented condition; this may be within some dialects and also is expected to be supported by Alembic tests
    • migration + changelogs for #3184
    • add individual doc notes as well to MySQL, Postgreql fixes #3184

    → <<cset b510b990947c>>

  13. Log in to comment