Oracle reflection for get_indexes returns an index representing the primary_key if the column name is lowercase in Oracle

Issue #1867 resolved
Former user created an issue

Apparently, Oracle's columns are not exclusively stored in uppercase.
I have a column named 'group'. In ALL_IND_COLUMNS (and apparently all other oracle views/tables, I imagine) this is stored as lower-case while all other column_names are upper-case. I imagine this is because 'group' is a keyword.

SQL> select column_name from all_tab_columns where table_name = 'SITEGROUPS';

COLUMN_NAME
------------------------------
group
DESCRIPTION

Here is the patch to fix the particular reflection problem I was having which was returning a redundant index representing the primary_key constraint even though reflection reports the primary key separately, as a PrimaryKeyConstraint:

--- sqlalchemy/dialects/oracle/base.py  2010-08-03 06:37:52.000000000 -0400
+++ ../SQLAlchemy-0.6.1.4/sqlalchemy/dialects/oracle/base.py    2010-08-03 06:36:49.000000000 -0400
@@ -875,21 +875,21 @@
         last_index_name = None
         pkeys = self.get_primary_keys(connection, table_name, schema,
                                       resolve_synonyms=resolve_synonyms,
                                       dblink=dblink,
                                       info_cache=kw.get('info_cache'))
         uniqueness = dict(NONUNIQUE=False, UNIQUE=True)

         oracle_sys_col = re.compile(r'SYS_NC\d+\$', re.IGNORECASE)
         for rset in rp:
             # don't include the primary key columns
-            if rset.column_name in [for s in pkeys](s.upper()):
+            if rset.column_name.upper() in [for s in pkeys](s.upper()):
                 continue
             if rset.index_name != last_index_name:
                 index = dict(name=self.normalize_name(rset.index_name), column_names=[                indexes.append(index)
             index['unique'](])
) = uniqueness.get(rset.uniqueness, False)

             # filter out Oracle SYS_NC names.  could also do an outer join
             # to the all_tab_columns table and check for real col names there.
             if not oracle_sys_col.match(rset.column_name):
                 index['column_names']('column_names').append(self.normalize_name(rset.column_name))

The bigger issue

I imagine this is a can of worms since the assumption was probably universally made that oracle stores its column names in UPPER case... (so I will probably rename my columns to avoid problems).

  • Kent

Comments (17)

  1. Former user Account Deleted

    I see now that SQLAlchemy has requested that this be in lower-case by surrounding it in quotes:

    CREATE TABLE sitegroups (
            "group" NVARCHAR2(3) NOT NULL,
            description NVARCHAR2(40),
            PRIMARY KEY ("group")
    )
    

    Again, because it is a keyword, no doubt.

  2. Former user Account Deleted

    Found another issue with oracle reflection (postgres reflection processes this correctly):

    If there are more than one indexes on a column (for example, the primary key is two columns, and there is another index on one of those primary key columns, then reflection ignores ''both'' indexes instead of just the primary key index.

    Attached is a .py script demonstrating.

    Script output before patch below:

    type: PrimaryKeyConstraint
    name: sys_c0049060
    columns: [u'id_b'](u'id_a',)
    
    (u'SYS_C0049060', u'ID_A', u'UNIQUE')
    (u'SYS_C0049060', u'ID_B', u'UNIQUE')
    (u'SYS_C0049061', u'ID_B', u'UNIQUE')
    

    Correct script output after patch below:

    type: PrimaryKeyConstraint
    name: sys_c0049060
    columns: [u'id_b'](u'id_a',)
    
    type: Index
    name: sys_c0049061
    columns: [u'id_b'](u'id_b')
    
    (u'SYS_C0049060', u'ID_A', u'UNIQUE')
    (u'SYS_C0049060', u'ID_B', u'UNIQUE')
    (u'SYS_C0049061', u'ID_B', u'UNIQUE')
    

    This patch makes the patch in the description of this ticket ''(above)'' obsolete

    --- lib/sqlalchemy/dialects/oracle/base.py      2010-07-02 01:25:26.000000000 -0400
    +++ ../SQLAlchemy-0.6.1.6kb/lib/sqlalchemy/dialects/oracle/base.py      2010-08-03 14:42:52.000000000 -0400
    @@ -873,34 +873,46 @@
                                     schema=self.denormalize_name(schema))
             indexes = [        last_index_name = None
             pkeys = self.get_primary_keys(connection, table_name, schema,
                                           resolve_synonyms=resolve_synonyms,
                                           dblink=dblink,
                                           info_cache=kw.get('info_cache'))
             uniqueness = dict(NONUNIQUE=False, UNIQUE=True)
    
             oracle_sys_col = re.compile(r'SYS_NC\d+\$', re.IGNORECASE)
    +
    +        def sort_and_upper(names):
    +            return sorted([i.upper() for i in names](]
    ))
    +
    +        pk_names = sort_and_upper(pkeys)
    +
    +        def remove_if_primary_key(index):
    +            # don't include the primary key index
    +            if index is not None and \
    +               sort_and_upper(index['column_names']('column_names')) == pk_names:
    +                indexes.pop()
    +
    +        index = None
             for rset in rp:
    -            # don't include the primary key columns
    -            if rset.column_name in [for s in pkeys](s.upper()):
    -                continue
                 if rset.index_name != last_index_name:
    +                remove_if_primary_key(index)
                     index = dict(name=self.normalize_name(rset.index_name), column_names=[                indexes.append(index)
                 index['unique'](])
    ) = uniqueness.get(rset.uniqueness, False)
    
                 # filter out Oracle SYS_NC names.  could also do an outer join
                 # to the all_tab_columns table and check for real col names there.
                 if not oracle_sys_col.match(rset.column_name):
                     index['column_names']('column_names').append(self.normalize_name(rset.column_name))
                 last_index_name = rset.index_name
    +        remove_if_primary_key(index)
             return indexes
    
         @reflection.cache
         def _get_constraint_data(self, connection, table_name, schema=None,
                                 dblink='', **kw):
    
             rp = connection.execute(
                 sql.text("""SELECT
                  ac.constraint_name,
                  ac.constraint_type,
    

    (Could get fancier and stop checking for the primary key index once it is found...)

    Thanks,

    Kent

    P.S. This ticket is closely related to #1868 (same file is patched)

  3. Mike Bayer repo owner
    • changed milestone to 0.6.4

    do you think you might try working these tests for here and #1868 into test/dialect/test_oracle.py ? there's a lot of details here that you seem to have under your belt.

  4. Former user Account Deleted

    Sure, but I'd like your input:

    I found out oracle won't allow this:

    CREATE TABLE sometable (
            id_a NVARCHAR2(255) NOT NULL,
            id_b NVARCHAR2(255) NOT NULL,
            col_c NVARCHAR2(255),
            PRIMARY KEY (id_a, id_b),
            UNIQUE (id_a, id_b)
    )
            UNIQUE (id_a, id_b)
            *
    ERROR at line 6:
    ORA-02261: such unique or primary key already exists in the table
    

    However, it does allow it if the order is changed (I wonder if this is unintended by oracle):

    CREATE TABLE sometable (
            id_a NVARCHAR2(255) NOT NULL,
            id_b NVARCHAR2(255) NOT NULL,
            col_c NVARCHAR2(255),
            PRIMARY KEY (id_a, id_b),
            UNIQUE (id_b, id_a)
    )
    
    Table created.
    

    Upon inspection, oracle has created two constraints, a 'U' and a 'P', but both use the same index.

    SQL> select CONSTRAINT_NAME, CONSTRAINT_TYPE,  STATUS, INDEX_NAME from user_constraints;
    
    CONSTRAINT_NAME                C STATUS   INDEX_NAME
    ------------------------------ - -------- ------------------------------
    SYS_C0049092                   C ENABLED
    SYS_C0049093                   C ENABLED
    SYS_C0049094                   P ENABLED  SYS_C0049094
    SYS_C0049095                   U ENABLED  SYS_C0049094
    
    
    
    SQL> select INDEX_NAME, TABLE_NAME, UNIQUENESS from user_indexes;
    
    INDEX_NAME                     TABLE_NAME                     UNIQUENES
    ------------------------------ ------------------------------ ---------
    SYS_C0049094                   SOMETABLE                      UNIQUE
    

    The issue is we only reflect constraints where "constraint_type IN ('R','P')" and we use indexes to determine our UniqueConstraints for reflection...

    After my patch, reflection finds the PrimaryKeyConstraint

    type: PrimaryKeyConstraint
    name: sys_c0049094
    columns: [u'id_b'](u'id_a',)
    

    but technically there should also be a UniqueConstraint listed with a different order of columns (but Oracle isn't really honoring the order of columns anyway, since it is using the same index).

    Do we even want to attempt "correct" reflection support for this scenario?

    -Kent

  5. Former user Account Deleted

    Also, I'm only familiar with cvs, svn (familiar with hg conceptually) and my oracle base.py is from 0.6.1. If I submit patches like the ones here, is that fine, our should I download hg client, etc?

  6. Former user Account Deleted

    Replying to guest:

    but technically there should also be a UniqueConstraint listed with a different order of columns (but Oracle isn't really honoring the order of columns anyway, since it is using the same index).

    Do we even want to attempt "correct" reflection support for this scenario?

    Wait a second, we don't even reflect UniqueConstraints, do we? Only Indexes, so maybe this ambiguity goes away. (Of course it brings up whether we ''should'' reflect uniqueconstraints, but that would be a different topic.

    Input?

  7. Mike Bayer repo owner

    Replying to guest:

    Also, I'm only familiar with cvs, svn (familiar with hg conceptually) and my oracle base.py is from 0.6.1. If I submit patches like the ones here, is that fine, our should I download hg client, etc?

    you don't have to worry about mercurial, while I encourage you to learn a DVCS someday I only need to get a unified diff file. Its best if its against the latest tip, which you can download without a VC client: http://hg.sqlalchemy.org/sqlalchemy/archive/default.tar.gz (this is off the downloads page).

  8. Mike Bayer repo owner

    Replying to guest:

    Wait a second, we don't even reflect UniqueConstraints, do we? Only Indexes, so maybe this ambiguity goes away. (Of course it brings up whether we ''should'' reflect uniqueconstraints, but that would be a different topic.

    at the moment, unique constraints aren't part of what the inspector interface pulls in. But certainly its something the inspector would have eventually - the point of inspector is that its a place for pretty much everything about a database to be introspected. We'd probably add a method called "get_constraints" to inspector that pulls them all in. But that's a new feature and wouldn't have to be part of this bug report.

  9. Former user Account Deleted

    Replying to guest:

    Do we even want to attempt "correct" reflection support for this scenario?

    There is the notion that the inspector should only reflect the pkconstraint, not also its index. (I can agree with that.) However, the issue is that, as I've shown, one index can enforce several constraints. It would be nice for inspector to report the index for a pk if isn't strictly for the pk ( ''also'' for another constraint). Also, it is possible for 2 indexes to have the same columns and then we don't know which index to ignore (and oracle 8.0's all_constraints doesn't have index_name.)

    Should we make it simple and ignore all indexes if they have the same set of columns as the pk? (even if it misses indexes it shouldn't? ) (and ignore implications this has on the future plans for the inspector?) Any other ideas?

  10. Mike Bayer repo owner

    so to let it take in all the indexes:

    diff -r 9d6cb72791502e7a704a244093d6daca9697dd3c lib/sqlalchemy/dialects/oracle/base.py
    --- a/lib/sqlalchemy/dialects/oracle/base.py    Fri Aug 06 11:03:05 2010 -0400
    +++ b/lib/sqlalchemy/dialects/oracle/base.py    Sat Aug 07 13:05:09 2010 -0400
    @@ -888,8 +888,8 @@
             oracle_sys_col = re.compile(r'SYS_NC\d+\$', re.IGNORECASE)
             for rset in rp:
                 # don't include the primary key columns
    -            if rset.column_name in [for s in pkeys](s.upper()):
    -                continue
    +#            if rset.column_name in [for s in pkeys](s.upper()):
    +#                continue
                 if rset.index_name != last_index_name:
                     index = dict(name=self.normalize_name(rset.index_name), column_names=[])
                     indexes.append(index)
    

    Pretty much just one test fails for real, reflecting the table and then trying to recreate it - "ORA-01408: such column list already indexed". Another fails because a warning is raised regarding the unexpected index but that is harmless and the test can be made to ignore the warning.

    So reflecting the additional index here is not a big deal, just slightly less compatible with how all the other databases do it, and it means you can't issue create_all() on a set of reflected metadata unless we figure something out (which....we probably should).

  11. Mike Bayer repo owner

    so let me just add then, that the original patch here which addresses this, I only care about the "round trip" capability. if your patch removes only indexes that have the identical columns then that is probably fine. We'd need a "round trip" test added to test/dialect/test_oracle.py.

    class RoundTripTest(TestBase):
        __only_on__ = 'oracle'
    
        @testing.provide_metadata
        def test_basic(self):
            t1 = Table('t1', metadata,
                Column('id1', Integer, primary_key=True),
                Column('id2', Integer, primary_key=True),
                Column('c3', Integer),
            )
    
            t2 = Table('t1', metadata,
                Column('id1', Integer, primary_key=True),
                Column('id2', Integer, primary_key=True),
                Column('c3', Integer),
            )
            # index that covers a subset of PKs, or 
            # whatever it is we need to test here
            Index('some_index', t2.c.id2, t2.c.c3)
    
            metadata.create_all()
    
            m2 = MetaData()
            m2.reflect()
    
            t1_reflected = m2.tables['t1']('t1')
            t2_reflected = m2.tables['t2']('t2')
    
            # do some assertions here, that 
            # there's an index, etc.
            # assert xyz in t1_reflected.indexes
    
            m2.drop_all()
            m2.create_all()
    
  12. Former user Account Deleted

    Patch

    Disclaimer

    The inspector knowingly may not reflect all indexes if an index shares the exact same columns as the primary key constraint index. This should be very rare... I can only imagine a use-case is with some optimization where having two indexes in a different order affect some binary tree search or something, but not a normal situation...

    --- sqlalchemy-default/lib/sqlalchemy/dialects/oracle/base.py   2010-08-05 12:28:01.000000000 -0400
    +++ SQLAlchemy-0.6.3.1kb/lib/sqlalchemy/dialects/oracle/base.py 2010-08-07 18:02:23.000000000 -0400
    @@ -886,11 +886,22 @@
             uniqueness = dict(NONUNIQUE=False, UNIQUE=True)
    
             oracle_sys_col = re.compile(r'SYS_NC\d+\$', re.IGNORECASE)
    +
    +        def upper_name_set(names):
    +            return set([for i in names](i.upper()))
    +
    +        pk_names = upper_name_set(pkeys)
    +
    +        def remove_if_primary_key(index):
    +            # don't include the primary key index
    +            if index is not None and \
    +               upper_name_set(index['column_names']('column_names')) == pk_names:
    +                indexes.pop()
    +
    +        index = None
             for rset in rp:
    -            # don't include the primary key columns
    -            if rset.column_name in [for s in pkeys](s.upper()):
    -                continue
                 if rset.index_name != last_index_name:
    +                remove_if_primary_key(index)
                     index = dict(name=self.normalize_name(rset.index_name), column_names=[                indexes.append(index)
                 index['unique'](])
    ) = uniqueness.get(rset.uniqueness, False)
    @@ -900,6 +911,7 @@
                 if not oracle_sys_col.match(rset.column_name):
                     index['column_names']('column_names').append(self.normalize_name(rset.column_name))
                 last_index_name = rset.index_name
    +        remove_if_primary_key(index)
             return indexes
    
         @reflection.cache
    @@ -932,7 +944,6 @@
             constraint_data = rp.fetchall()
             return constraint_data
    
    -    @reflection.cache
         def get_primary_keys(self, connection, table_name, schema=None, **kw):
             """
    
    @@ -943,7 +954,10 @@
                 dblink
    
             """
    +        return self._get_primary_keys(connection, table_name, schema=None, **kw)[0](0)
    
    +    @reflection.cache
    +    def _get_primary_keys(self, connection, table_name, schema=None, **kw):
             resolve_synonyms = kw.get('oracle_resolve_synonyms', False)
             dblink = kw.get('dblink', '')
             info_cache = kw.get('info_cache')
    @@ -953,6 +967,7 @@
                                               resolve_synonyms, dblink,
                                               info_cache=info_cache)
             pkeys = [       constraint_name = None
             constraint_data = self._get_constraint_data(connection, table_name,
                                             schema, dblink,
                                             info_cache=kw.get('info_cache'))
    @@ -962,8 +977,18 @@
                 (cons_name, cons_type, local_column, remote_table, remote_column, remote_owner) = \
                     row[0:2](]
    +) + tuple([for x in row[2:6](self.normalize_name(x))])
                 if cons_type == 'P':
    +                if constraint_name is None:
    +                    constraint_name = self.normalize_name(cons_name)
                     pkeys.append(local_column)
    -        return pkeys
    +        return pkeys, constraint_name
    +
    +    def get_pk_constraint(self, connection, table_name, schema=None, **kw):
    +        cols, name = self._get_primary_keys(connection, table_name, schema=schema, **kw)
    +
    +        return {
    +            'constrained_columns':cols,
    +            'name':name
    +        }
    
         @reflection.cache
         def get_foreign_keys(self, connection, table_name, schema=None, **kw):
    diff -u -r sqlalchemy-default/test/dialect/test_oracle.py SQLAlchemy-0.6.3.1kb/test/dialect/test_oracle.py
    --- sqlalchemy-default/test/dialect/test_oracle.py      2010-08-05 12:28:01.000000000 -0400
    +++ SQLAlchemy-0.6.3.1kb/test/dialect/test_oracle.py    2010-08-07 18:58:47.000000000 -0400
    @@ -1168,4 +1168,64 @@
                 t.select(for_update=True).limit(2).offset(3).execute
             )
    
    -
    +
    +class RoundTripIndexTest(TestBase):
    +    __only_on__ = 'oracle'
    +
    +    def test_basic(self):
    +        engine = testing.db
    +        metadata = MetaData(engine)
    +
    +        table=Table("sometable", metadata,
    +            Column("id_a", Unicode(255), primary_key=True),
    +            Column("id_b", Unicode(255), primary_key=True, unique=True),
    +            Column("group", Unicode(255), primary_key=True),
    +            Column("col", Unicode(255)),
    +            UniqueConstraint('col','group'),
    +        )
    +
    +        normalind = Index('tableind', table.c.id_b, table.c.group) # "group" is a keyword, so lower case
    +
    +        # create
    +        metadata.create_all()
    +        try:
    +            # round trip, create from reflection
    +            mirror = MetaData(engine)
    +            mirror.reflect()
    +            metadata.drop_all()
    +            mirror.create_all()
    +
    +            # inspect the reflected creation
    +            inspect = MetaData(engine)
    +            inspect.reflect()
    +
    +            def obj_definition(obj):
    +                return (obj.__class__, tuple([for c in obj.columns](c.name)), getattr(obj, 'unique', None))
    +
    +            # find what the primary k constraint name should be
    +            primaryconsname = engine.execute(
    +               text("""SELECT constraint_name
    +                   FROM all_constraints
    +                   WHERE table_name = :table_name
    +                   AND owner = :owner
    +                   AND constraint_type = 'P' """),
    +               table_name=table.name.upper(),
    +               owner=engine.url.username.upper()).fetchall()[0](0)[0](0)
    +
    +            reflectedtable = inspect.tables[table.name](table.name)
    +
    +            # make a dictionary of the reflected objects:
    +            reflected = dict([i) for i in reflectedtable.indexes | reflectedtable.constraints]((obj_definition(i),))
    +
    +            # assert we got primary key constraint and its name, Error if not in dict
    +            assert reflected[('id_a','id_b','group',), None)]((PrimaryKeyConstraint,).name.upper() == primaryconsname.upper()
    +            # Error if not in dict
    +            assert reflected[('id_b','group',), False)]((Index,).name == normalind.name
    +            assert (Index, ('id_b',), True) in reflected
    +            assert (Index, ('col','group',), True) in reflected
    +            assert len(reflectedtable.constraints) == 1
    +            assert len(reflectedtable.indexes) == 3
    +
    +        finally:
    +            metadata.drop_all()
    +
    

    You may need to change the test to get it to run correctly (for example, owner=engine.url.username.upper() might not work). In the future, I'd like to figure out how to kick off the tests instead of running the python manually.

    Note that several of the asserts as well as the create_all() failed before the patch, which also fixes and tests #1868.

  13. Former user Account Deleted

    Important

    I made a mistake when I pasted line 957:

    return self._get_primary_keys(connection, table_name, schema=None, **kw)[0](0)
    

    should be:

    --- SQLAlchemy-0.6.3.1kb/lib/sqlalchemy/dialects/oracle/base.py 2010-08-07 18:02:23.000000000 -0400
    +++ /home/rarch/site-packages/SQLAlchemy-0.6.3.1dev-py2.6-linux-x86_64.egg/sqlalchemy/dialects/oracle/base.py   2010-08-08 10:15:06.000000000 -0400
    @@ -954,7 +954,7 @@
                 dblink
    
             """
    -        return self._get_primary_keys(connection, table_name, schema=None, **kw)[0](0)
    +        return self._get_primary_keys(connection, table_name, schema, **kw)[0](0)
    
         @reflection.cache
         def _get_primary_keys(self, connection, table_name, schema=None, **kw):
    

    Oops. Sorry!

  14. Mike Bayer repo owner

    Ill try to have a look in the next couple of days, its a little busy right now. thanks for the effort you've put in here !

  15. Log in to comment