.key is inconsistently used for bound param names (inserts, but not expressions?)
As the title says, when trying to insert data into a table with a column name containing parentheses, you get a KeyError. Small example:
from sqlalchemy import create_engine, MetaData, Table, Column, Integer
engine = create_engine('postgresql://postgres@localhost:5432/test')
meta = MetaData(engine)
table = Table('test_parentheses', meta,
Column('CT Volume (cc)', Integer)
)
table.create()
i = table.insert()
i.execute({'CT Volume (cc)': 1})
gives:
....
/home/joris/miniconda/envs/dev/lib/python2.7/site-packages/sqlalchemy/engine/default.pyc in do_execute(self, cursor, statement, parameters, context)
322
323 def do_execute(self, cursor, statement, parameters, context=None):
--> 324 cursor.execute(statement, parameters)
325
326 def do_execute_no_params(self, cursor, statement, context=None):
KeyError: 'CT Volume (cc'
This happens with postgresql (not with sqlite), and was reported at the pandas issue tracker (https://github.com/pydata/pandas/issues/8762), but I also saw references to the same issue here: https://github.com/onyxfish/csvkit/issues/317 and http://stackoverflow.com/questions/25917741/sqlalchemy-with-postgres-insert-into-a-table-whose-columns-have-parentheses.
Is this possibly a sqlalchemy bug, or has it do to with the psycopg2 driver?
Comments (8)
-
repo owner -
repo owner - changed title to .key is inconsistently used for bound param names (inserts, but not expressions?)
- changed milestone to 1.0
- changed component to sql
-
repo owner i'd want to get some background on how prevalent this issue is. having .key work is one area however it's also possible that the bound parameter keys can be transparently escaped in any case. see
#3246 -
reporter @zzzeek Thanks for the quick and elaborate response!
So if we want, we could 'fix' this in pandas by always using keys for inserting (always converting all column names to 'safe' keys), as we only use the insert method (and no expressions). But it seems rather burdensome for just this. -
reporter I don't really know how prevalent this issue is. We just got that issue report, and I found two other references mentioning the issue that I linked to above.
From the pandas side, I think it happens quite a lot that people use parentheses in their column names, certainly eg from reading in csv or excel data.
On the other hand, once you start working with databases, it is really not a very good idea to keep such column names. -
repo owner well here is the problem. It is extremely inefficient for SQLAlchemy to unconditionally run translation through the parameter dictionary, so I don't know that I want a totally transparent system, this is what happens:
conn.execute(table.insert(), [{"Col Value (one)": 1}, {"Col Value (two)": 2}, ... <for 10000 rows>)
Above, if the statement renders the parameter as "Col Value one", that indicates that for every one of 10000 parameter sets we need to make copies of each dictionary with the new parameter name within it. So a program might find that it suddenly runs dramatically slower once a column of a certain name is added to a table, and that tends to be the kind of issue that makes people very upset.
So two other approaches that can work here are:
one, don't use "pyformat" paramstyle with Pandas. Use "format". In fact I think this is probably what you should do, as this solves everything:
engine = create_engine('postgresql://scott:tiger@localhost:5432/test', echo=True, paramstyle='format')
this is a positional style, which does incur the translation of parameter dictionaries into tuples, but at least here you are telling it to do so, and the whole program runs fine:
from sqlalchemy import create_engine, MetaData, Table, Column, Integer engine = create_engine('postgresql://scott:tiger@localhost:5432/test', echo=True, paramstyle='format') meta = MetaData(engine) table = Table('test_parentheses', meta, Column('CT Volume (cc)', Integer) #, key='foob') ) meta.create_all() i = table.insert() i.execute({'CT Volume (cc)': 1}) print engine.execute(table.select().where(table.c['CT Volume (cc)'] == 5))
or, the compiler could escape bound parameter names that are obviously in conflict with %(pyformat)s style, and that would just be a convention, any "%" or "()" character needs to be an underscore, or something like that.
-
repo owner - changed status to resolved
- The :attr:
.Column.key
attribute is now used as the source of anonymous bound parameter names within expressions, to match the existing use of this value as the key when rendered in an INSERT or UPDATE statement. This allows :attr:.Column.key
to be used as a "substitute" string to work around a difficult column name that doesn't translate well into a bound parameter name. Note that the paramstyle is configurable on :func:.create_engine
in any case, and most DBAPIs today support a named and positional style. fixes#3245
→ <<cset a19b2f419cd8>>
-
repo owner OK I think for the time being, pass "paramstyle='format'" to your create_engine(). That will solve the problem without any intrusive changes to how SQLAlchemy renders parameters or anything like that. I've pushed up some documentation to this effect.
- Log in to comment
SQLAlchemy is quoting the column name appropriately:
however it's possible that (cc) inside of the %(...)s is not supported by psycopg2. so lets try that:
and it's the bound parameter name that fails:
Traceback (most recent call last): File "test2.py", line 11, in <module> """, {'CT Volume (cc)': 1}) KeyError: 'CT Volume (cc'
because this works:
so....it's a not entirely surprising limitation on the psycopg2 side but the kind of thing SQLAlchemy often needs to work around. For the insert, this can be worked around by giving it a "key":
which emits:
for other expressions, it is not yet using the ".key", such as
table.select().where(table.c.foob == 5)
, so unfortunately .key is not yet a general solution.