- edited description
MetaData.reflect with "only" keyword fails due to wrong capitalization [Oracle]
CREATE TABLE "FOO"."BAR" (
...
)
This works as expected:
>>> engine.has_table('BAR', schema='FOO')
True
>>> engine.has_table('bar', schema='FOO')
True
However this doesn't:
>>> meta.reflect(schema='FOO', only=['BAR'])
InvalidRequestError: Could not reflect: requested table(s) not available in oracle+cx_oracle://***:***@*** schema 'FOO': (BAR)
which only seems to works if names in only
are manually lowercased:
>>> meta.reflect(schema='FOO', only=['bar'])
It looks like available
set in the reflect
method happens to be an OrderedSet
consisting of lowercased table names.
Comments (25)
-
reporter -
repo owner this is likely not a bug, as you need to use all lower case names in order to indicate a case insensitive name. if these tables were not created with quoted "FOO", then they are case insensitive. you need to use all lower case on the SQLA side.
-
repo owner -
reporter But why does the
.has_table
work fine with both upper and lowercase then?I just thought be logical to expect the opposite, if the tables are case insensitive, any particular capitalization in
only
shouldn't matter at all? -
repo owner I dont know because the behavior is essentially undefined.
-
reporter I see, that makes sense, thanks! It's indeed undefined.
Closing this, should definitely rtfm more :)
-
reporter - changed status to resolved
-
repo owner this is terribly documented, there really needs to be full paragraphs on the case sensitivity thing as it is a pretty deep rabbit hole. the "surprise" of the upper/lower part only applies to Oracle and Firebird so it doesn't come up much. dont worry about the missing the docs on this one :)
-
@zzzeek We had recently a report about something that looks similar on the pandas issue tracker, but with mysql: https://github.com/pydata/pandas/issues/7815
The case was:
- a tabel 'Foo' was created by writing a new DataFrame with
df.to_sql("Foo", engine)
- this table is written again with
df.to_sql("Foo", engine)
, but this now fails because:- the table is saved as 'foo' in MetaData
engine.has_table('Foo')
returns True, but if we then dometa.reflect(only=['Foo'])
, this fails as it does not find 'Foo' (there is only 'foo')
So when creating the table and checking if it exists, the name is lowercased under the hood, but when you try to get the table or reflect it, it does not lowercase it automatically, leading to errors.
- a tabel 'Foo' was created by writing a new DataFrame with
-
reporter Yea it's very confusing for Oracle :)
-
repo owner we need some docs. but just so you all know, we standardize on lower case as case insensitive so that code can work agnostically across databases. you can force case sensitivity as well using the quoted_name object: http://docs.sqlalchemy.org/en/rel_0_9/core/sqlelement.html?highlight=quoted#sqlalchemy.sql.elements.quoted_name
-
repo owner also, if Oracle has a table that is actually named as quoted "Foo" with capital F, and it is matching on a match for all lower case "foo" in a "SELECT FROM ALL_TABLES" query, Oracle's then getting involved with case insensitive collations too. it's a losing game to emulate the whole DB's case collation behavior on the Python side for things like "only" and such.
-
I don't directly see how the quoted_name can solve this issue. In this specific case, the problem was not that
has_table
was case insensitive (it was, so checking for 'Foo' resulted in True), but thatreflect(only=[])
was case sensitive: not being able to get 'Foo' because the table creation was case insensitive and 'Foo' was saved as 'foo'. -
repo owner I wasn't aware that there's an issue other than, "it's confusing and needs to be better documented". are you saying using all lower case names is still causing problems ?
-
reporter I guess one of the points that @jorisvandenbossche is trying to raise is that there's an inconsistency between behaviour of
only
inMetaData#reflect
andEngine#has_table
? -
Confusing it certainly is :-)
No, using all lower case names (as a user) does not cause problems. The problem I am talking about was with MySQL, when creating a table 'Foo' (
Table("Foo", ...)
), butreflect(only=['Foo'])
not finding this table.The docs on the table name (http://docs.sqlalchemy.org/en/rel_0_9/core/metadata.html#sqlalchemy.schema.Table.params.name):
Names which contain no upper case characters will be treated as case insensitive names, and will not be quoted unless they are a reserved word or contain special characters. A name with any number of upper case characters is considered to be case sensitive, and will be sent as quoted.
So I would concluse from this that if I create a table
"Foo"
this is quoted (which is also the case), and so this is stored as case sensitive 'Foo'. But it seems this is actually stored as 'foo'. This is not a problem forhas_table
, but it is forreflect(only=[])
. -
repo owner this goes back to my saying, "the behavior is undefined". What is happening here is that the SELECT we send to Oracle is comparing the name "FOO" to "foo" using case insensitive collation - it matches. But in Python-land, we are comparing the names in "get tables" to the name in "only" using case sensitive collation - so "FOO" will not match "foo" on that side. Two kinds of comparison in play.
if the confusion is, "why is the name lowercased??" - then forget about Oracle. Try it on MySQL, where we get lower case names back for non case-sensitive names, and we don't change the casing of anything. You will get the same issue if you send the wrong casing for "only".
then if the confusion is, "but on MySQL I wouldn't use FOO, I'd know I'm on MySQL so I'd use "foo", but because I'm on Oracle I'm very accustomed to using FOO, so why is this the wrong casing for 'only'", that again is SQLAlchemy has to standardize on a casing convention, so that code which wishes to say 'has_table("foo")' against an arbitrary backend can work. Oracle and Firebird are the outliers, so we went with lowercase as case insensitive.
Then, if the confusion is, "why is case insensitivity decided based on the case of the identifier?", the answer to that is that in the usual case except for Oracle and Firebird, the lower case name implicitly matches what all other databases give to us in a case insensitive context from their schema tables.
-
repo owner if you create a Table object with "Foo", and emit the CREATE TABLE, it will be created with the name quoted and will be the case sensitive name "Foo". However, within this issue we are only talking about reflection - nothing is being "created". The reflection process gives us the name "foo", e.g. case insensitive.
There is a very simple way to explain this. When you say: metadata.reflect(only=['FOO']), and you get nothing back, the answer is correct: there is no table in your database that has the name "FOO" in a case sensitive fashion.
-
repo owner so actually, when i said the behavior is undefined, I was only referring to has_table(). In that case, Oracle's case-insensitive collation on the comparison is giving us a misleading answer, but SQLA doesn't know on a backend-by-backend basis which ones have case sensitive or insensitive comparisons.
-
Sorry, I was also talking about creation, and in fact the inconsistency between creation and reflection. With a MySQL engine, I create a table specified with the name 'Foo', but it is created as 'foo' (also in the MySQL database, problem was on Windows). I want, later on, to reflect this table that I think is 'Foo' from the database, but this fails, as it does not find 'Foo'.
This happens with:
df.to_sql("Foo", engine) # -> creates a table 'Foo' (the user thinks, but this is actually 'foo') df.to_sql("Foo", engine, if_exists='replace') # -> checks if the table exists with has_table -> this is the case -> # reflect the table (to drop it) -> this fails, because it does not find 'Foo'
It is surprising to a user that this fails. But I can understand that there is nothing to do about this, and that it is just up to the user to not use upper case characters, and to know that on their system all table names are converted to lower case, so they should use lower case.
In any case, thanks for your time explaining!
-
repo owner Heh OK not to keep going on, but in MySQL you have extra special surprises going on, which is that their case sensitivity changes based on operating system: http://dev.mysql.com/doc/refman/5.0/en/identifier-case-sensitivity.html
for SQLAlchemy's part, if you name it "Foo", you definitely get the quoting, and that is the best SQLAlchemy can do:
from sqlalchemy import create_engine, Table, MetaData, Column, Integer e = create_engine("mysql://scott:tiger@localhost/test", echo=True) m = MetaData() t = Table("CaseSensitive", m, Column('x', Integer)) m.create_all(e)
CREATE TABLE `CaseSensitive` ( x INTEGER )
So if on the MySQL side you see weird behavior, that's MySQL :).
-
yes, but because it's MySQL, and it's on Windows ... the quoting does not have any effect and the table is called 'casesensitive' anyway. But then, that's something up to the user to be aware of.
I was just thinking, to prevent potential confusion of people that are not aware of this, we could in
pandas
in principle check if the provided table name (eg 'CaseSensitive') is found in the database after creating it. And if not, warning that this is the case, probably due to case sensitivity issues and they should consider using lower case. This sounds maybe as too much, but the problem is that we otherwise get confusing error messages (about a table that is not found, but your just writing it ..). -
repo owner I was thinking, if metadata.reflect with "only" finds tables missing, we actually look and see those names were present case-insensitively, and warn about it ("though there is a table 'foo', did you mean 'foo' and not 'FOO'?"). what do you think of that? if metadata.reflect() is really the path here we're concerned about.
-
I think that could be a useful error message. Eg in the
if missing
close differentiate on whether a lower case version is present to have a different message. -
repo owner I'll review a PR with this if it has tests
- Log in to comment