.key is inconsistently used for bound param names (inserts, but not expressions?)

Issue #3245 resolved
Joris Van den Bossche created an issue

As the title says, when trying to insert data into a table with a column name containing parentheses, you get a KeyError. Small example:

from sqlalchemy import create_engine, MetaData, Table, Column, Integer

engine = create_engine('postgresql://postgres@localhost:5432/test')
meta = MetaData(engine)
table = Table('test_parentheses', meta,
    Column('CT Volume (cc)', Integer)
)
table.create()

i = table.insert()
i.execute({'CT Volume (cc)': 1})

gives:

....
/home/joris/miniconda/envs/dev/lib/python2.7/site-packages/sqlalchemy/engine/default.pyc in do_execute(self, cursor, statement, parameters, context)
    322 
    323     def do_execute(self, cursor, statement, parameters, context=None):
--> 324         cursor.execute(statement, parameters)
    325 
    326     def do_execute_no_params(self, cursor, statement, context=None):

KeyError: 'CT Volume (cc'

This happens with postgresql (not with sqlite), and was reported at the pandas issue tracker (https://github.com/pydata/pandas/issues/8762), but I also saw references to the same issue here: https://github.com/onyxfish/csvkit/issues/317 and http://stackoverflow.com/questions/25917741/sqlalchemy-with-postgres-insert-into-a-table-whose-columns-have-parentheses.

Is this possibly a sqlalchemy bug, or has it do to with the psycopg2 driver?

Comments (8)

  1. Mike Bayer repo owner

    SQLAlchemy is quoting the column name appropriately:

    CREATE TABLE test_parentheses (
        "CT Volume (cc)" INTEGER
    )
    
    INSERT INTO test_parentheses ("CT Volume (cc)") VALUES (%(CT Volume (cc))s)
    

    however it's possible that (cc) inside of the %(...)s is not supported by psycopg2. so lets try that:

    import psycopg2
    
    conn = psycopg2.connect(user='scott', password='tiger', host='localhost', database='test')
    cursor = conn.cursor()
    cursor.execute("""CREATE TABLE test_parentheses (
        "CT Volume (cc)" INTEGER
    )""")
    
    cursor.execute("""
    INSERT INTO test_parentheses ("CT Volume (cc)") VALUES (%(CT Volume (cc))s)
    """, {'CT Volume (cc)': 1})
    

    and it's the bound parameter name that fails:

    Traceback (most recent call last): File "test2.py", line 11, in <module> """, {'CT Volume (cc)': 1}) KeyError: 'CT Volume (cc'

    because this works:

    cursor.execute("""
    INSERT INTO test_parentheses ("CT Volume (cc)") VALUES (%(CT Volume cc)s)
    """, {'CT Volume cc': 1})
    

    so....it's a not entirely surprising limitation on the psycopg2 side but the kind of thing SQLAlchemy often needs to work around. For the insert, this can be worked around by giving it a "key":

    from sqlalchemy import create_engine, MetaData, Table, Column, Integer
    
    engine = create_engine('postgresql://scott:tiger@localhost:5432/test', echo=True)
    meta = MetaData(engine)
    table = Table('test_parentheses', meta,
        Column('CT Volume (cc)', Integer, key='foob')
    )
    table.create()
    
    i = table.insert()
    i.execute({'foob': 1})
    

    which emits:

    INSERT INTO test_parentheses ("CT Volume (cc)") VALUES (%(foob)s)
    2014-11-10 15:30:46,631 INFO sqlalchemy.engine.base.Engine {'foob': 1}
    

    for other expressions, it is not yet using the ".key", such as table.select().where(table.c.foob == 5), so unfortunately .key is not yet a general solution.

  2. Mike Bayer repo owner

    i'd want to get some background on how prevalent this issue is. having .key work is one area however it's also possible that the bound parameter keys can be transparently escaped in any case. see #3246

  3. Joris Van den Bossche reporter

    @zzzeek Thanks for the quick and elaborate response!
    So if we want, we could 'fix' this in pandas by always using keys for inserting (always converting all column names to 'safe' keys), as we only use the insert method (and no expressions). But it seems rather burdensome for just this.

  4. Joris Van den Bossche reporter

    I don't really know how prevalent this issue is. We just got that issue report, and I found two other references mentioning the issue that I linked to above.

    From the pandas side, I think it happens quite a lot that people use parentheses in their column names, certainly eg from reading in csv or excel data.
    On the other hand, once you start working with databases, it is really not a very good idea to keep such column names.

  5. Mike Bayer repo owner

    well here is the problem. It is extremely inefficient for SQLAlchemy to unconditionally run translation through the parameter dictionary, so I don't know that I want a totally transparent system, this is what happens:

    conn.execute(table.insert(), [{"Col Value (one)": 1}, {"Col Value (two)": 2}, ... <for 10000 rows>)

    Above, if the statement renders the parameter as "Col Value one", that indicates that for every one of 10000 parameter sets we need to make copies of each dictionary with the new parameter name within it. So a program might find that it suddenly runs dramatically slower once a column of a certain name is added to a table, and that tends to be the kind of issue that makes people very upset.

    So two other approaches that can work here are:

    one, don't use "pyformat" paramstyle with Pandas. Use "format". In fact I think this is probably what you should do, as this solves everything:

    engine = create_engine('postgresql://scott:tiger@localhost:5432/test', echo=True, paramstyle='format')
    

    this is a positional style, which does incur the translation of parameter dictionaries into tuples, but at least here you are telling it to do so, and the whole program runs fine:

    from sqlalchemy import create_engine, MetaData, Table, Column, Integer
    
    engine = create_engine('postgresql://scott:tiger@localhost:5432/test', echo=True, paramstyle='format')
    meta = MetaData(engine)
    table = Table('test_parentheses', meta,
        Column('CT Volume (cc)', Integer) #, key='foob')
    )
    meta.create_all()
    
    i = table.insert()
    i.execute({'CT Volume (cc)': 1})
    
    print engine.execute(table.select().where(table.c['CT Volume (cc)'] == 5))
    

    or, the compiler could escape bound parameter names that are obviously in conflict with %(pyformat)s style, and that would just be a convention, any "%" or "()" character needs to be an underscore, or something like that.

  6. Mike Bayer repo owner
    • The :attr:.Column.key attribute is now used as the source of anonymous bound parameter names within expressions, to match the existing use of this value as the key when rendered in an INSERT or UPDATE statement. This allows :attr:.Column.key to be used as a "substitute" string to work around a difficult column name that doesn't translate well into a bound parameter name. Note that the paramstyle is configurable on :func:.create_engine in any case, and most DBAPIs today support a named and positional style. fixes #3245

    → <<cset a19b2f419cd8>>

  7. Mike Bayer repo owner

    OK I think for the time being, pass "paramstyle='format'" to your create_engine(). That will solve the problem without any intrusive changes to how SQLAlchemy renders parameters or anything like that. I've pushed up some documentation to this effect.

  8. Log in to comment