MetaData.reflect with "only" keyword fails due to wrong capitalization [Oracle]

Issue #3181 resolved
Ivan Smirnov created an issue
CREATE TABLE "FOO"."BAR" (
...
)

This works as expected:

>>> engine.has_table('BAR', schema='FOO')
True
>>> engine.has_table('bar', schema='FOO')
True

However this doesn't:

>>> meta.reflect(schema='FOO', only=['BAR'])
InvalidRequestError: Could not reflect: requested table(s) not available in oracle+cx_oracle://***:***@*** schema 'FOO': (BAR)

which only seems to works if names in only are manually lowercased:

>>> meta.reflect(schema='FOO', only=['bar'])

It looks like available set in the reflect method happens to be an OrderedSet consisting of lowercased table names.

Comments (25)

  1. Mike Bayer repo owner

    this is likely not a bug, as you need to use all lower case names in order to indicate a case insensitive name. if these tables were not created with quoted "FOO", then they are case insensitive. you need to use all lower case on the SQLA side.

  2. Ivan Smirnov reporter

    But why does the .has_table work fine with both upper and lowercase then?

    I just thought be logical to expect the opposite, if the tables are case insensitive, any particular capitalization in only shouldn't matter at all?

  3. Ivan Smirnov reporter

    I see, that makes sense, thanks! It's indeed undefined.

    Closing this, should definitely rtfm more :)

  4. Mike Bayer repo owner

    this is terribly documented, there really needs to be full paragraphs on the case sensitivity thing as it is a pretty deep rabbit hole. the "surprise" of the upper/lower part only applies to Oracle and Firebird so it doesn't come up much. dont worry about the missing the docs on this one :)

  5. Joris Van den Bossche

    @zzzeek We had recently a report about something that looks similar on the pandas issue tracker, but with mysql: https://github.com/pydata/pandas/issues/7815

    The case was:

    • a tabel 'Foo' was created by writing a new DataFrame with df.to_sql("Foo", engine)
    • this table is written again with df.to_sql("Foo", engine), but this now fails because:
      • the table is saved as 'foo' in MetaData
      • engine.has_table('Foo') returns True, but if we then do meta.reflect(only=['Foo']), this fails as it does not find 'Foo' (there is only 'foo')

    So when creating the table and checking if it exists, the name is lowercased under the hood, but when you try to get the table or reflect it, it does not lowercase it automatically, leading to errors.

  6. Mike Bayer repo owner

    also, if Oracle has a table that is actually named as quoted "Foo" with capital F, and it is matching on a match for all lower case "foo" in a "SELECT FROM ALL_TABLES" query, Oracle's then getting involved with case insensitive collations too. it's a losing game to emulate the whole DB's case collation behavior on the Python side for things like "only" and such.

  7. Joris Van den Bossche

    I don't directly see how the quoted_name can solve this issue. In this specific case, the problem was not that has_table was case insensitive (it was, so checking for 'Foo' resulted in True), but that reflect(only=[]) was case sensitive: not being able to get 'Foo' because the table creation was case insensitive and 'Foo' was saved as 'foo'.

  8. Mike Bayer repo owner

    I wasn't aware that there's an issue other than, "it's confusing and needs to be better documented". are you saying using all lower case names is still causing problems ?

  9. Ivan Smirnov reporter

    I guess one of the points that @jorisvandenbossche is trying to raise is that there's an inconsistency between behaviour of only in MetaData#reflect and Engine#has_table?

  10. Joris Van den Bossche

    Confusing it certainly is :-)

    No, using all lower case names (as a user) does not cause problems. The problem I am talking about was with MySQL, when creating a table 'Foo' (Table("Foo", ...)), but reflect(only=['Foo']) not finding this table.

    The docs on the table name (http://docs.sqlalchemy.org/en/rel_0_9/core/metadata.html#sqlalchemy.schema.Table.params.name):

    Names which contain no upper case characters will be treated as case insensitive names, and will not be quoted unless they are a reserved word or contain special characters. A name with any number of upper case characters is considered to be case sensitive, and will be sent as quoted.

    So I would concluse from this that if I create a table "Foo" this is quoted (which is also the case), and so this is stored as case sensitive 'Foo'. But it seems this is actually stored as 'foo'. This is not a problem for has_table, but it is for reflect(only=[]).

  11. Mike Bayer repo owner

    this goes back to my saying, "the behavior is undefined". What is happening here is that the SELECT we send to Oracle is comparing the name "FOO" to "foo" using case insensitive collation - it matches. But in Python-land, we are comparing the names in "get tables" to the name in "only" using case sensitive collation - so "FOO" will not match "foo" on that side. Two kinds of comparison in play.

    if the confusion is, "why is the name lowercased??" - then forget about Oracle. Try it on MySQL, where we get lower case names back for non case-sensitive names, and we don't change the casing of anything. You will get the same issue if you send the wrong casing for "only".

    then if the confusion is, "but on MySQL I wouldn't use FOO, I'd know I'm on MySQL so I'd use "foo", but because I'm on Oracle I'm very accustomed to using FOO, so why is this the wrong casing for 'only'", that again is SQLAlchemy has to standardize on a casing convention, so that code which wishes to say 'has_table("foo")' against an arbitrary backend can work. Oracle and Firebird are the outliers, so we went with lowercase as case insensitive.

    Then, if the confusion is, "why is case insensitivity decided based on the case of the identifier?", the answer to that is that in the usual case except for Oracle and Firebird, the lower case name implicitly matches what all other databases give to us in a case insensitive context from their schema tables.

  12. Mike Bayer repo owner

    if you create a Table object with "Foo", and emit the CREATE TABLE, it will be created with the name quoted and will be the case sensitive name "Foo". However, within this issue we are only talking about reflection - nothing is being "created". The reflection process gives us the name "foo", e.g. case insensitive.

    There is a very simple way to explain this. When you say: metadata.reflect(only=['FOO']), and you get nothing back, the answer is correct: there is no table in your database that has the name "FOO" in a case sensitive fashion.

  13. Mike Bayer repo owner

    so actually, when i said the behavior is undefined, I was only referring to has_table(). In that case, Oracle's case-insensitive collation on the comparison is giving us a misleading answer, but SQLA doesn't know on a backend-by-backend basis which ones have case sensitive or insensitive comparisons.

  14. Joris Van den Bossche

    Sorry, I was also talking about creation, and in fact the inconsistency between creation and reflection. With a MySQL engine, I create a table specified with the name 'Foo', but it is created as 'foo' (also in the MySQL database, problem was on Windows). I want, later on, to reflect this table that I think is 'Foo' from the database, but this fails, as it does not find 'Foo'.

    This happens with:

    df.to_sql("Foo", engine)   # -> creates a table 'Foo' (the user thinks, but this is actually 'foo')
    df.to_sql("Foo", engine, if_exists='replace')   # -> checks if the table exists with has_table -> this is the case -> 
                                                    # reflect the table (to drop it) -> this fails, because it does not find 'Foo'
    

    It is surprising to a user that this fails. But I can understand that there is nothing to do about this, and that it is just up to the user to not use upper case characters, and to know that on their system all table names are converted to lower case, so they should use lower case.

    In any case, thanks for your time explaining!

  15. Mike Bayer repo owner

    Heh OK not to keep going on, but in MySQL you have extra special surprises going on, which is that their case sensitivity changes based on operating system: http://dev.mysql.com/doc/refman/5.0/en/identifier-case-sensitivity.html

    for SQLAlchemy's part, if you name it "Foo", you definitely get the quoting, and that is the best SQLAlchemy can do:

    from sqlalchemy import create_engine, Table, MetaData, Column, Integer
    
    e = create_engine("mysql://scott:tiger@localhost/test", echo=True)
    m = MetaData()
    t = Table("CaseSensitive", m, Column('x', Integer))
    m.create_all(e)
    
    CREATE TABLE `CaseSensitive` (
        x INTEGER
    )
    

    So if on the MySQL side you see weird behavior, that's MySQL :).

  16. Joris Van den Bossche

    yes, but because it's MySQL, and it's on Windows ... the quoting does not have any effect and the table is called 'casesensitive' anyway. But then, that's something up to the user to be aware of.

    I was just thinking, to prevent potential confusion of people that are not aware of this, we could in pandas in principle check if the provided table name (eg 'CaseSensitive') is found in the database after creating it. And if not, warning that this is the case, probably due to case sensitivity issues and they should consider using lower case. This sounds maybe as too much, but the problem is that we otherwise get confusing error messages (about a table that is not found, but your just writing it ..).

  17. Mike Bayer repo owner

    I was thinking, if metadata.reflect with "only" finds tables missing, we actually look and see those names were present case-insensitively, and warn about it ("though there is a table 'foo', did you mean 'foo' and not 'FOO'?"). what do you think of that? if metadata.reflect() is really the path here we're concerned about.

  18. Joris Van den Bossche

    I think that could be a useful error message. Eg in the if missing close differentiate on whether a lower case version is present to have a different message.

  19. Log in to comment