Unicode strings are not properly rendered as SQL parameters

Issue #86 resolved
Former user created an issue

Brief Description: Unicode strings assigned to mapped variables are not properly rendered as strings in the resulting UPDATE SQL command. Instead, they are passed without formatting.

Environment: PostgreSQL 8.1, SQLAlchemy from SVN on 2006.02.28, Python 2.4, wxPython 2.6, Cygwin

Defect Details: By settings a string-type variable to the results of a unicode function, a SQLError(ProgrammingError) exception is thrown, though possibly other exceptions may be thrown as well. The resulting output looks something like this:

sqlalchemy.exceptions.SQLError: (ProgrammingError) ERROR:  column "bcds24" does not exist

UPDATE equipment.types SET long=BCDs24 WHERE types.type_id = 2

As you can see, the value "BCDs24" is not escaped properly within the UPDATE SQL. The parameters as dumped from engine.echo = 1 show the proper unicode string prefix:

UPDATE equipment.types SET long=%(long)s WHERE types.type_id = %(types_type_id)s
[2, 'long': u'BCDs24'}]({'types_type_id':)

Additionally, given the following sequence:

print 'Prior:', type(self.item.long)
self.item.long = self.long_desc.GetValue()
print 'Post:', type(self.item.long)

This generates the following output:

Prior: <type 'str'>
Post: <type 'unicode'>

Notes: If possible, universal support for Unicode would be preferable, since strings returning from the database may be in unicode format as well.

Comments (14)

  1. Mike Bayer repo owner

    there is a Unicode type that handles this translation....it converts u'' strings to raw bytes going into the database, and raw bytes coming out of the database back to Unicode.

    so you definitely do not want the Unicode type if there is a scenario where postgres/psycopg is dealing with unicode properly, since we are blowing away the u'' aspect of the string going in. if its agreed that psycopg/postgres just doesnt deal with unicode at all, then we can maybe put a translation in PGString/PGChar to handle this. but using Unicode explicitly means that all strings coming back from the db are converted to u'', which is not always desired. it seems to me that while we may be able to fix the ingoing side of unicode, if you want unicode in your result set I dont see a way around explicitly specifying the Unicode type.

    Column types can be overridden when using reflection, i.e.:

       t = Table('mytable', db, 
           Column('myunicodecol', Unicode(30)),
           autoload=True
       )
    
  2. Mike Bayer repo owner

    added "convert_unicode" option to SQLEngine in changeset:1068. so you can either specify your engine to do Unicode which will convert in/out in all cases, or you can use Unicode on a column-by-column basis (i guess next ppl will want it at the table level too...).

  3. Valentino Volonghi
    • changed status to open
    • removed status

    convert_unicode still seems to be broken if the string to convert is actually None when retrieved (which could happen of course).

  4. Former user Account Deleted

    As you can see, I've added a test case and the resulting log. I'm reopening this ticket because it still doesn't work in revision 1215.

    Now, this may be due to my needing to set a psycopg1 setting on my end, but browsing through the psycopg source it looks like it is set to UTF-8 encoding by default.

  5. Mike Bayer repo owner

    gambit -

    your test doesnt make any use of the unicode translation features of SA....right now, its sending your u'' strings straight to psycopg1 which appears to be mis-handling them (i dont have psycopg1 to test here). you should have either:

    engine = create_engine(..., convert_unicode=True)
    

    or, load your table like this:

    types_table = DB.Table('types', engine, 
       Column('long', Unicode(50)),
    autoload=True, schema='equipment')
    

    try that out. also try the unit test:

       python test/testtypes.py --db_uri='postgres://my_db...'
    

    this unit test passes completely with psycopg2.

    if all of this doenst work, we'll reopen.

  6. Former user Account Deleted

    The test case works, though I had to do a bit of hackery to get it to work (it doesn't seem to recognize any of the command line parameters...?)

    Looking over the actual test case, I don't think this actually tests the unicode support in a similarly blind fashion to my usage (which, of course, may mean I'm using it wrong :D)

    To be honest, if what you say is true, then I'm not sure where this is failing. Suffice to say, if I change the {{{Unicode(250)}}} in the unicode unittest to {{{String(250)}}} (which is how I believe it would be represented when I reflect) things fail spectacularly. Within the unittest framework I get all sorts of interesting memory errors, but if I use a tightened version of the file I previously uploaded (this one stands on it's own and cleans up after itself) you can see the failure occur.

    It's my position that, given a unicode database, detection by reflection should either create Unicode() column types, or explicitly created (or reflected) String() column types should behave appropriately when passed unicode strings (and convert up to utf-8 for non-unicode strings).

    Further anlysis, including a psycopg specific test case, shows that this is not related to SA. I'm not sure if the {{{convert_unicode=True}}} flag should be automatically set on for UTF8 databases or not, but setting it on does seem to solve the problem, with unicode strings both coming out and going into the database persisting correctly.

    I've attached a stand-alone psycopg1 test script, and a standalone SA script for comparison and testing. As the initd site is down at the moment I can't test this against psycopg2, but does psycopg2 work correctly without the convert_unicode=True engine tag?

  7. Former user Account Deleted

    To make the fix explicit:

    Psycopg1 does not handle unicode properly. Upgrade to psycopg2. This error shows itself by the unicode string being rendered to the actual SQL without surrounding quotations, resulting in a "<string> is not a valid column name" style error.

  8. Log in to comment