result_map get confused when there is a period in the label due to special-case processing for sqlite

Issue #1428 resolved
Paul Harrington created an issue

There is special-case processing in ResultProxy._init_metadata of labels that contain a '.' to work around some issue with SQlite. This breaks when a label does contain a '.' (which will happen if there is a period in any part of the object name i.e. schema, table or column).

Comments (15)

  1. Paul Harrington reporter

    Forgot to add that the schema name is not set until after the table is created as the mssql introspection code does not work yet for multi-part schemas (I have a patched version which does) so the table-creation/dropping code fails unless the schema is blank or has a single component.

    pjjH

  2. Paul Harrington reporter
    • assigned issue to

    Pasting in edited transcript from IRC channel

    [17:49](17:49) <zzzeek> pjjH: that bug you have, it is a product of the difference between the name of the column as it is spelled out in SQL, i.e. foo.bar.bat, versus what comes back in cursor.description in the result set.
    [17:49](17:49) <zzzeek> pjjH: so the answer lies in how the column label is rendered by the ORM, and that the same label comes back
    [17:49](17:49) <pjjH> ok. good. where should I look to fix it?
    [17:50](17:50) <zzzeek> pjjH: so you may have to change how columns get labeled when multi dotted schema is used like that
    [17:50](17:50) <zzzeek> pjjH: and possibly, you may have to change how the compiler talks to "result_map" - you'd see that in compiler.py
    [17:50](17:50) <zzzeek> that is the round trip essentially
    [17:51](17:51) <pjjH> awesome! I have been looking for hours for this piece of code.
    [17:53](17:53) <zzzeek> pjjH: id also note that the unit test for this would not use the ORM.  its SQL expression stuff
    [17:53](17:53) <zzzeek> pjjH: probbly in test/sql/query.py
    [17:56](17:56) <pjjH> OK. Good. I (incorrectly) thought that once the SQL-level tests were passing that everything would be OK.
    [17:57](17:57) <zzzeek> if you add new functionality, like schema names as "x.y.z", then new tests are needed
    [17:57](17:57) <zzzeek> theres a lot of issues here - we make column labels like tablename_colname
    [17:57](17:57) <zzzeek> in this case we probably want token1_token2_token3_schemaname_tablename_colname
    [17:58](17:58) <pjjH> explain token1_token2....
    [17:58](17:58) <zzzeek> the compiler truncates them but keys them
    [17:58](17:58) <zzzeek> if you are dbo.schema.mytable.foobar
    [17:58](17:58) <zzzeek> select(use_labels=True) would say select dbo.schema.mytable.foobar as dbo_schema_mytable_foobar
    [18:00](18:00) <pjjH> very good. I just changed the working select statement to use_labels=True and it breaks. This should help quite a bit in tracking down the problem. Thanks.
    
  3. Paul Harrington reporter

    this looks like it: in ResultProxy, there is some code that diddles the colname if it finds a period in it.

            if '.' in colname:
                # sqlite will in some circumstances prepend table name to colnames, so strip
                origname = colname
                colname = colname.split('.')[-1](-1)
    

    Possible workarounds: . only do this if the dialect is SQLite (or add in a property to the mssql and sybase dialects such as 'support_multi_part_schema' . change the label-generation code to substitute the '.' in the schema to 'dot' or '_' or whatever

    It doesn't look like the cursor.description labels are 'parsed' in any way: they appear to be used solely for keying into the result_map

  4. Paul Harrington reporter

    I have re-read the documentation for use_labels=True and now understand that the expected behavior is that the original columns names no longer work as indexes into the rowproxy. Since we like indexing rowproxies directly in Python, it follows that the key cannot contain a '.' so we should replace periods with something innocuous -- to Python -- like an '_'

    pjjH

  5. Paul Harrington reporter

    What a flip-flopper I am today! after writing the unit tests, I see that there are a bunch of additional checks that we would have to make to ensure that the generated label is valid as a Python identifier. It seems much more reasonable to me to try and fix the bug with SQLite prepending the table-name to the column name in the cursor description that to go to any great lengths to avoid '.' appearing in any component of the object name ... if the identifier has been correctly quoted/escaped in the SQL then why should we care what form the names are in the result-set?

    So I propose the patch below but realize that it may break if the column names contain a period e.g.

    sqlite> create table foo (am not dotty even I do have a '.' in my name integer) ...> ; sqlite> .schema foo CREATE TABLE foo (am not dotty even I do have a '.' in my name integer);

    pjjH

    Index: lib/sqlalchemy/engine/base.py

    --- lib/sqlalchemy/engine/base.py (revision 6022) +++ lib/sqlalchemy/engine/base.py (working copy) @@ -1450,7 +1450,7 @@ if self.dialect.description_encoding: colname = colname.decode(self.dialect.description_encoding)

    • if '.' in colname:
    • if '.' in colname and self.dialect.name == 'sqlite': # sqlite will in some circumstances prepend table name to colnames, so strip origname = colname colname = colname.split('.')-1
  6. Mike Bayer repo owner

    the part confusing me is if we are using use_labels=True, the column name shouldn't have any dots in it. can you run your failing test with echo='debug' and paste that so I can see exactly what's happening ?

    the patch is fine, that step should probably be moved somehow to the sqlite dialect. do all tests pass if you run against mysql/postgres ?

  7. Paul Harrington reporter

    the column name shouldn't have any dots in it. I don't agree: If use_labels is set to true, the labels will be some concatenation of the schema + table + column ... if any of these components contains a '.' then the label will contain a '.' and the result_map will be confused. There is nothing in the quoting code to remove periods so the patch is just displacing the problem to sqlite.

    I have only run the tests against mssql thus far but will try it on a couple more platforms shortly.

  8. Mike Bayer repo owner

    Replying to phrrngtn:

    the column name shouldn't have any dots in it. I don't agree: If use_labels is set to true, the labels will be some concatenation of the schema + table + column ... if any of these components contains a '.' then the label will contain a '.' and the result_map will be confused. There is nothing in the quoting code to remove periods so the patch is just displacing the problem to sqlite.

    I have only run the tests against mssql thus far but will try it on a couple more platforms shortly.

    ah, right. well guess what - I think the use_labels logic should replace the "." in the schema name with an underscore "_". that would allow the multi-token schema name to work like the other tokens in the column's identifier without the need to affect our sqlite workaround for now.

  9. Log in to comment