result_map get confused when there is a period in the label due to special-case processing for sqlite
There is special-case processing in ResultProxy._init_metadata of labels that contain a '.' to work around some issue with SQlite. This breaks when a label does contain a '.' (which will happen if there is a period in any part of the object name i.e. schema, table or column).
Comments (15)
-
reporter -
reporter - assigned issue to
Pasting in edited transcript from IRC channel
[17:49](17:49) <zzzeek> pjjH: that bug you have, it is a product of the difference between the name of the column as it is spelled out in SQL, i.e. foo.bar.bat, versus what comes back in cursor.description in the result set. [17:49](17:49) <zzzeek> pjjH: so the answer lies in how the column label is rendered by the ORM, and that the same label comes back [17:49](17:49) <pjjH> ok. good. where should I look to fix it? [17:50](17:50) <zzzeek> pjjH: so you may have to change how columns get labeled when multi dotted schema is used like that [17:50](17:50) <zzzeek> pjjH: and possibly, you may have to change how the compiler talks to "result_map" - you'd see that in compiler.py [17:50](17:50) <zzzeek> that is the round trip essentially [17:51](17:51) <pjjH> awesome! I have been looking for hours for this piece of code. [17:53](17:53) <zzzeek> pjjH: id also note that the unit test for this would not use the ORM. its SQL expression stuff [17:53](17:53) <zzzeek> pjjH: probbly in test/sql/query.py [17:56](17:56) <pjjH> OK. Good. I (incorrectly) thought that once the SQL-level tests were passing that everything would be OK. [17:57](17:57) <zzzeek> if you add new functionality, like schema names as "x.y.z", then new tests are needed [17:57](17:57) <zzzeek> theres a lot of issues here - we make column labels like tablename_colname [17:57](17:57) <zzzeek> in this case we probably want token1_token2_token3_schemaname_tablename_colname [17:58](17:58) <pjjH> explain token1_token2.... [17:58](17:58) <zzzeek> the compiler truncates them but keys them [17:58](17:58) <zzzeek> if you are dbo.schema.mytable.foobar [17:58](17:58) <zzzeek> select(use_labels=True) would say select dbo.schema.mytable.foobar as dbo_schema_mytable_foobar [18:00](18:00) <pjjH> very good. I just changed the working select statement to use_labels=True and it breaks. This should help quite a bit in tracking down the problem. Thanks.
-
reporter this looks like it: in ResultProxy, there is some code that diddles the colname if it finds a period in it.
if '.' in colname: # sqlite will in some circumstances prepend table name to colnames, so strip origname = colname colname = colname.split('.')[-1](-1)
Possible workarounds: . only do this if the dialect is SQLite (or add in a property to the mssql and sybase dialects such as 'support_multi_part_schema' . change the label-generation code to substitute the '.' in the schema to 'dot' or '_' or whatever
It doesn't look like the cursor.description labels are 'parsed' in any way: they appear to be used solely for keying into the result_map
-
reporter I have re-read the documentation for use_labels=True and now understand that the expected behavior is that the original columns names no longer work as indexes into the rowproxy. Since we like indexing rowproxies directly in Python, it follows that the key cannot contain a '.' so we should replace periods with something innocuous -- to Python -- like an '_'
pjjH
-
reporter - attached unit_test.patch
some weakish unit-tests for dealing with '.' in generated labels
-
reporter What a flip-flopper I am today! after writing the unit tests, I see that there are a bunch of additional checks that we would have to make to ensure that the generated label is valid as a Python identifier. It seems much more reasonable to me to try and fix the bug with SQLite prepending the table-name to the column name in the cursor description that to go to any great lengths to avoid '.' appearing in any component of the object name ... if the identifier has been correctly quoted/escaped in the SQL then why should we care what form the names are in the result-set?
So I propose the patch below but realize that it may break if the column names contain a period e.g.
sqlite> create table foo (am not dotty even I do have a '.' in my name integer) ...> ; sqlite> .schema foo CREATE TABLE foo (am not dotty even I do have a '.' in my name integer);
pjjH
Index: lib/sqlalchemy/engine/base.py
--- lib/sqlalchemy/engine/base.py (revision 6022) +++ lib/sqlalchemy/engine/base.py (working copy) @@ -1450,7 +1450,7 @@ if self.dialect.description_encoding: colname = colname.decode(self.dialect.description_encoding)
- if '.' in colname:
- if '.' in colname and self.dialect.name == 'sqlite': # sqlite will in some circumstances prepend table name to colnames, so strip origname = colname colname = colname.split('.')-1
-
reporter - changed title to result_map get confused when there is a period in the label due to special-case processing for sqlite
-
reporter - attached fix_for_ORM_generated_labels.diff
crude workaround for problem with '.' in column labels
-
repo owner - changed milestone to 0.5.xx
the part confusing me is if we are using use_labels=True, the column name shouldn't have any dots in it. can you run your failing test with echo='debug' and paste that so I can see exactly what's happening ?
the patch is fine, that step should probably be moved somehow to the sqlite dialect. do all tests pass if you run against mysql/postgres ?
-
reporter the column name shouldn't have any dots in it. I don't agree: If use_labels is set to true, the labels will be some concatenation of the schema + table + column ... if any of these components contains a '.' then the label will contain a '.' and the result_map will be confused. There is nothing in the quoting code to remove periods so the patch is just displacing the problem to sqlite.
I have only run the tests against mssql thus far but will try it on a couple more platforms shortly.
-
repo owner Replying to phrrngtn:
the column name shouldn't have any dots in it. I don't agree: If use_labels is set to true, the labels will be some concatenation of the schema + table + column ... if any of these components contains a '.' then the label will contain a '.' and the result_map will be confused. There is nothing in the quoting code to remove periods so the patch is just displacing the problem to sqlite.
I have only run the tests against mssql thus far but will try it on a couple more platforms shortly.
ah, right. well guess what - I think the use_labels logic should replace the "." in the schema name with an underscore "_". that would allow the multi-token schema name to work like the other tokens in the column's identifier without the need to affect our sqlite workaround for now.
-
repo owner - changed milestone to 0.6.0
-
repo owner - changed component to sql
-
repo owner - changed status to resolved
-
repo owner - removed milestone
Removing milestone: 0.6.0 (automated comment)
- Log in to comment
Forgot to add that the schema name is not set until after the table is created as the mssql introspection code does not work yet for multi-part schemas (I have a patched version which does) so the table-creation/dropping code fails unless the schema is blank or has a single component.
pjjH