PostgreSQL Sequence not created by metadata.create_all() from declarative mixin

Issue #3951 closed
Andrew Paulo Robillo
created an issue

I'd been copying this snippet from my old cookbook to add a surrogate key to a declarative table class:

class MixinIntKey(object):
    @declared_attr
    def key(cls):
        return Column(
            pg.INTEGER,
            Sequence(f"sqk_{cls.__tablename__}", 1, 1),
            nullable=False,
            primary_key=True
            )

This code was working before (some 3-4 years ago, I think around version 0.8 ): it generates the Sequence when I build the database objects through metadata.create_all(). The only change I made to the snippet was to update it to use format strings (Python 3.6).

After some googling around and testing (see attached file/link), the Sequence can be generated only if the metadata is passed during Sequence init:

Sequence(f"sqk_{cls.__tablename__}", 1, 1, metadata=metadata)

My environment: psycopg2==2.7.1, SQLAlchemy==1.1.8, Python 3.6 64bit on Windows 8.1 64bit, PostgreSQL 9.6 server

Comments (17)

  1. Michael Bayer repo owner

    havent run your full test yet as it seems to have lots going on that is not related. A simple test passes:

    from sqlalchemy import *
    from sqlalchemy.orm import *
    from sqlalchemy.ext.declarative import declarative_base, declared_attr
    from sqlalchemy.dialects import postgresql as pg
    
    Base = declarative_base()
    
    
    class MixinIntKey(object):
        @declared_attr
        def key(cls):
            return Column(
                pg.INTEGER,
                Sequence(f"sqk_{cls.__tablename__}", 1, 1),
                nullable=False,
                primary_key=True
                )
    
    class A(MixinIntKey, Base):
        __tablename__ = 'a'
        id = Column(Integer, primary_key=True)
    
    e = create_engine("postgresql://scott:tiger@localhost/test", echo=True)
    Base.metadata.drop_all(e)
    Base.metadata.create_all(e)
    

    output:

    CREATE SEQUENCE sqk_a INCREMENT BY 1 START WITH 1
    2017-04-01 11:53:03,093 INFO sqlalchemy.engine.base.Engine {}
    2017-04-01 11:53:03,094 INFO sqlalchemy.engine.base.Engine COMMIT
    2017-04-01 11:53:03,098 INFO sqlalchemy.engine.base.Engine 
    CREATE TABLE a (
        id INTEGER NOT NULL, 
        key INTEGER NOT NULL, 
        PRIMARY KEY (id, key)
    )
    
    
    2017-04-01 11:53:03,098 INFO sqlalchemy.engine.base.Engine {}
    2017-04-01 11:53:03,105 INFO sqlalchemy.engine.base.Engine COMMIT
    
  2. Michael Bayer repo owner

    I can't run your attached test. I get:

    sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) data type integer has no default operator class for access method "gist"
    HINT:  You must specify an operator class for the index or define a default operator class for the data type.
    

    the test case has many extraneous details that need to be removed. So far the issue looks like you have "schema" set up for your MetaData and you expect that to take place for the Sequence, which would indicate there's no bug here.

  3. Michael Bayer repo owner

    yup with schema, we get what's expected:

    CREATE SEQUENCE sqk_a INCREMENT BY 1 START WITH 1
    2017-04-01 11:59:43,919 INFO sqlalchemy.engine.base.Engine {}
    2017-04-01 11:59:43,920 INFO sqlalchemy.engine.base.Engine COMMIT
    2017-04-01 11:59:43,925 INFO sqlalchemy.engine.base.Engine 
    CREATE TABLE test_schema.a (
        id INTEGER NOT NULL, 
        key INTEGER NOT NULL, 
        PRIMARY KEY (id, key)
    )
    
    
    2017-04-01 11:59:43,925 INFO sqlalchemy.engine.base.Engine {}
    2017-04-01 11:59:43,933 INFO sqlalchemy.engine.base.Engine COMMIT
    

    that is, the sequence is created without a schema, because it does not specify one.

    Sequence needs to be associated with the MetaData() in order to get metadata.schema involved, this is documented, or specify schema to Sequence directly. When Sequence is present inside of Table(), the Sequence is not associated with the metadata, this is not implicit.

  4. Michael Bayer repo owner

    Please confirm your test does create the sequence, just not in the schema you intended due to the lack of association of Sequence with MetaData (again, not a bug). I can't run your test from here unless you can simplify it.

  5. Andrew Paulo Robillo reporter

    Sorry for the late reply on this and for reopening as I was travelling.

    I did several test runs to check if the issue is due to timing of object creation. It appears to be so. As you have recommended, I strip down the code to bare minimum for a PostgreSQL "best practice" use case: tables in non-public schema, and use of explicit sequences instead of serial.

    The mixin approach works when both Table and Sequence are without a schema. This is probably due to both being created on the public schema.

    from sqlalchemy import create_engine, MetaData, Column, Sequence
    from sqlalchemy.dialects.postgresql import INTEGER, TEXT
    from sqlalchemy.ext.declarative import declarative_base, declared_attr
    
    
    ENGINE_STR = "postgresql+psycopg2://testuser:correcthorse@192.168.159.129:5432/test"
    engine = create_engine(ENGINE_STR, echo=True)
    Base = declarative_base()
    
    
    class MixinIntKey(object):
        @declared_attr
        def key(cls):
            return Column(
                INTEGER,
                Sequence(f"sqk_{cls.__tablename__}", 1, 1),
                nullable=False,
                primary_key=True
                )
    
    
    class TableA(Base, MixinIntKey):
        __tablename__ = "table_a"
        column_a = Column(TEXT)
    
    
    Base.metadata.create_all(bind=engine)
    
    
    """
    2017-04-04 19:14:11,164 INFO sqlalchemy.engine.base.Engine CREATE SEQUENCE sqk_table_a INCREMENT BY 1 START WITH 1
    2017-04-04 19:14:11,164 INFO sqlalchemy.engine.base.Engine {}
    2017-04-04 19:14:11,164 INFO sqlalchemy.engine.base.Engine COMMIT
    2017-04-04 19:14:11,164 INFO sqlalchemy.engine.base.Engine
    CREATE TABLE table_a (
            column_a TEXT,
            key INTEGER NOT NULL,
            PRIMARY KEY (key)
    )
    
    
    2017-04-04 19:14:11,164 INFO sqlalchemy.engine.base.Engine {}
    2017-04-04 19:14:11,164 INFO sqlalchemy.engine.base.Engine COMMIT
    """
    

    Now, when a schema needs to be specified the typical approach (AFAIK) would be to instantiate declarative Base with a MetaData object with the schema name. The Table inheriting from Base would be created in the schema but the Sequence will not be created at all.

    SCHEMA = "testschema"
    ENGINE_STR = "postgresql+psycopg2://testuser:correcthorse@192.168.159.129:5432/test"
    engine = create_engine(ENGINE_STR, echo=True)
    Base = declarative_base(metadata=MetaData(schema=SCHEMA))
    
    
    class MixinIntKey(object):
        @declared_attr
        def key(cls):
            return Column(
                INTEGER,
                Sequence(f"sqk_{cls.__tablename__}", 1, 1),
                nullable=False,
                primary_key=True
                )
    
    
    class TableA(Base, MixinIntKey):
        __tablename__ = "table_a"
    
        column_a = Column(TEXT)
    
    
    Base.metadata.create_all(bind=engine)
    
    
    """
    2017-04-04 19:19:04,462 INFO sqlalchemy.engine.base.Engine select version()
    2017-04-04 19:19:04,462 INFO sqlalchemy.engine.base.Engine {}
    2017-04-04 19:19:04,462 INFO sqlalchemy.engine.base.Engine select current_schema()
    2017-04-04 19:19:04,462 INFO sqlalchemy.engine.base.Engine {}
    2017-04-04 19:19:04,462 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
    2017-04-04 19:19:04,462 INFO sqlalchemy.engine.base.Engine {}
    2017-04-04 19:19:04,478 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
    2017-04-04 19:19:04,478 INFO sqlalchemy.engine.base.Engine {}
    2017-04-04 19:19:04,478 INFO sqlalchemy.engine.base.Engine show standard_conforming_strings
    2017-04-04 19:19:04,478 INFO sqlalchemy.engine.base.Engine {}
    2017-04-04 19:19:04,478 INFO sqlalchemy.engine.base.Engine select relname from pg_class c join pg_namespace n on n.oid=c.relnamespace where n.nspname=%(schema)s and relname=%(name)s
    2017-04-04 19:19:04,478 INFO sqlalchemy.engine.base.Engine {'schema': 'testschema', 'name': 'table_a'}
    2017-04-04 19:19:04,478 INFO sqlalchemy.engine.base.Engine SELECT relname FROM pg_class c join pg_namespace n on n.oid=c.relnamespace where relkind='S' and n.nspname=current_schema() and relname=%(name)s
    2017-04-04 19:19:04,478 INFO sqlalchemy.engine.base.Engine {'name': 'sqk_table_a'}
    2017-04-04 19:19:04,478 INFO sqlalchemy.engine.base.Engine
    CREATE TABLE testschema.table_a (
            column_a TEXT,
            key INTEGER NOT NULL,
            PRIMARY KEY (key)
    )
    
    
    2017-04-04 19:19:04,478 INFO sqlalchemy.engine.base.Engine {}
    2017-04-04 19:19:04,478 INFO sqlalchemy.engine.base.Engine COMMIT
    """
    

    The correct approach to make things work is to explicitly specify medata on the Sequence mixin.

    class MixinIntKey(object):
        @declared_attr
        def key(cls):
            return Column(
                INTEGER,
                Sequence(f"sqk_{cls.__tablename__}", 1, 1, metadata=cls.metadata),
                nullable=False,
                primary_key=True
                )
    

    I believe the bug case would be the mixin working when no schema is specified: it relied on PostgreSQL's behavior of creating everything in public for it to work. A couple of possible fix:

    • make a Sequence mixin work like ForeignKey (feature request?): create it under the same schema of the Column's Table when metadata is not passed. This may be problematic due to sequences in Postgres, unlike foreign keys and indices, are owned by the schema and not the table, and that sequences may be shared across tables in different schema.
    • make schema argument mandatory for Postgres by raising an error: this should catch attention and not allow the case of public sequence in public table from working. That use case is just bound to surprise people as they progress into the more magical realm of mixins.
  6. Michael Bayer repo owner

    This may be problematic due to sequences in Postgres, unlike foreign keys and indices, are owned by the schema and not the table, and that sequences may be shared across tables in different schema.

    yes, that's why we can't really do that. At best there'd be a flag like, "inherit_schema=True", but then, you're passing a flag, might as well make it "metadata = Base.metadata" ;) this was why "metadata" was added as an argument to Sequence, because someone contributed the whole MetaData(schema='foo') feature, which I also was a little uncomfortable with due to side effects like these.

    make schema argument mandatory for Postgres by raising an error: this should catch attention and not allow the case of public sequence in public table from working. That use case is just bound to surprise people as they progress into the more magical realm of mixins.

    use of "schema" is not necessarily a widespread practice, and Postgresql itself (as is the case with all other databases) never require a "schema" argument, there's a default in place. So even if this weren't a huge backwards-incompatible change, it still wouldn't be appropriate.

    As you have recommended, I strip down the code to bare minimum for a PostgreSQL "best practice" use case: tables in non-public schema, and use of explicit sequences instead of serial.

    I'm not familiar with what's wrong with SERIAL? If your column is a simple integer primary key, using SERIAL is definitely the simplest approach...which is a large reason why this issue is IMO not too critical.

    Having to specify "metadata" to the Sequence is a little bit more awkward than I'd prefer, but this is not a common use case and I really can't think of any alternative that is worth it.

  7. Michael Bayer repo owner

    not to mention, I'm not familiar with why you need to have tables in a "non-public" schema as some kind of best practice. The "public" schema is already local to a database that itself is a "local" namespace.

  8. Michael Bayer repo owner

    also going back to the original use case. We're talking about a declarative mixin. Just do this:

    class MixinIntKey(object):
        @declared_attr
        def key(cls):
            return Column(
                pg.INTEGER,
                Sequence(f"sqk_{cls.__tablename__}", 1, 1, metadata=cls.metadata),
                nullable=False,
                primary_key=True
                )
    

    the Sequence also will receive the "after_parent_attach" event on behalf of the Column to which it is associated. This can be used to link to the Table association and set the "schema" after the fact (setting the MetaData on an existing Sequence is not public API right now):

    @event.listens_for(Sequence, "after_parent_attach")
    def _assoc_seq_col(seq_element, parent):
        @event.listens_for(parent, "after_parent_attach")
        def _assoc_seq_table(col_element, parent):
            seq_element.schema = parent.schema
    
  9. Andrew Paulo Robillo reporter

    The schema in Postgres is essentially a namespace. Explicitly defining a schema avoids namespace collisions and makes a large project manageable. Although the Zen of Python agrees to that as best practice, I understand that it's not the responsibility of this project (or anyone for that matter, unless you're an Inquisitor) to enforce "rightness".

    Perhaps, the path of least resistance would be to document this use case on the Postgres dialect section? Or is it more appropriate to be in the wiki/usage recipe?

  10. Andrew Paulo Robillo reporter

    True, SERIAL is the no-nonsense autoincrement "datatype": Postgres implements it as a sequence that's linked to the table (gets dropped with table). However, an explicit stand-alone sequence is a cheap way of generating unique IDs across partitioned tables without the bloat of UUID.

  11. Log in to comment