Postgres no longer supports convert_unicode=True

Issue #1792 resolved
Former user created an issue

Apparently with 0.6 I can no longer use unicode with postgres. Apparently the convert_unicode parameter to create_engine is ignored or something.

The following setup does not work: http://github.com/mitsuhiko/flask/blob/website/flask_website/database.py

Fails with UnicodeEncodeError when the session commits and object. Unicode strings are used throughout the application, bytestrings only if they are literals that are limited to ASCII.

Comments (11)

  1. Former user Account Deleted

    I just downgraded to 0.5.6 and it works, so I'm pretty confident now that this is something that broke new in 0.6. Probably some of the Python 3 unicode fixes cause that.

  2. Mike Bayer repo owner

    psycopg2 handles Python unicode objects natively, and if for some reason your psycopg2 isn't, this is detected upon first connect. This is described at http://www.sqlalchemy.org/trac/wiki/06Migration#NativeUnicodeMode . The convert_unicode flag is still honored by the String type both from a local setting as well as per-dialect, but my test script below works without it just as well. Note that its also good idea to use the Unicode type, not the String type, for columns that expect to handle unicode - that way you don't need a per-engine setting. Note also need to be using a PG database configured for utf-8.

    I'm using SQLA 0.6 in production for many months now with unicode + pg and so are others. Do you have a self-contained test script that actually runs ? Below is mine:

    #!coding: utf-8
    
    from sqlalchemy import *
    
    engine = create_engine('postgresql://scott:tiger@localhost/test',echo=True, convert_unicode=True)
    metadata = MetaData()
    
    foo = Table("foob", metadata,
        Column("data", String)
    )
    metadata.drop_all(engine)
    metadata.create_all(engine)
    
    data = u'drôle de petit voix m’a réveillé.'
    
    engine.execute(foo.insert(), data=data)
    print engine.execute(foo.select().where(foo.c.data==data)).fetchall()
    assert engine.execute(foo.select().where(foo.c.data==data)).scalar() == data
    

    output:

    z-eeks-Computer-3:sqlalchemy classic$ python test.py
    2010-05-03 14:06:22,881 INFO sqlalchemy.engine.base.Engine.0x...7a90 select version()
    2010-05-03 14:06:22,881 INFO sqlalchemy.engine.base.Engine.0x...7a90 {}
    2010-05-03 14:06:22,885 INFO sqlalchemy.engine.base.Engine.0x...7a90 select current_schema()
    2010-05-03 14:06:22,885 INFO sqlalchemy.engine.base.Engine.0x...7a90 {}
    2010-05-03 14:06:22,888 INFO sqlalchemy.engine.base.Engine.0x...7a90 select relname from pg_class c join pg_namespace n on n.oid=c.relnamespace where n.nspname=current_schema() and lower(relname)=%(name)s
    2010-05-03 14:06:22,889 INFO sqlalchemy.engine.base.Engine.0x...7a90 {'name': u'foob'}
    2010-05-03 14:06:22,893 INFO sqlalchemy.engine.base.Engine.0x...7a90 
    DROP TABLE foob
    2010-05-03 14:06:22,893 INFO sqlalchemy.engine.base.Engine.0x...7a90 {}
    2010-05-03 14:06:22,895 INFO sqlalchemy.engine.base.Engine.0x...7a90 COMMIT
    2010-05-03 14:06:22,898 INFO sqlalchemy.engine.base.Engine.0x...7a90 select relname from pg_class c join pg_namespace n on n.oid=c.relnamespace where n.nspname=current_schema() and lower(relname)=%(name)s
    2010-05-03 14:06:22,899 INFO sqlalchemy.engine.base.Engine.0x...7a90 {'name': u'foob'}
    2010-05-03 14:06:22,900 INFO sqlalchemy.engine.base.Engine.0x...7a90 
    CREATE TABLE foob (
        data VARCHAR
    )
    
    
    2010-05-03 14:06:22,901 INFO sqlalchemy.engine.base.Engine.0x...7a90 {}
    2010-05-03 14:06:22,905 INFO sqlalchemy.engine.base.Engine.0x...7a90 COMMIT
    2010-05-03 14:06:22,907 INFO sqlalchemy.engine.base.Engine.0x...7a90 INSERT INTO foob (data) VALUES (%(data)s)
    2010-05-03 14:06:22,907 INFO sqlalchemy.engine.base.Engine.0x...7a90 {'data': u'dr\xf4le de petit voix m\u2019a r\xe9veill\xe9.'}
    2010-05-03 14:06:22,908 INFO sqlalchemy.engine.base.Engine.0x...7a90 COMMIT
    2010-05-03 14:06:22,910 INFO sqlalchemy.engine.base.Engine.0x...7a90 SELECT foob.data 
    FROM foob 
    WHERE foob.data = %(data_1)s
    2010-05-03 14:06:22,910 INFO sqlalchemy.engine.base.Engine.0x...7a90 {'data_1': u'dr\xf4le de petit voix m\u2019a r\xe9veill\xe9.'}
    [de petit voix m\u2019a r\xe9veill\xe9.',)]((u'dr\xf4le)
    2010-05-03 14:06:22,913 INFO sqlalchemy.engine.base.Engine.0x...7a90 SELECT foob.data 
    FROM foob 
    WHERE foob.data = %(data_1)s
    2010-05-03 14:06:22,913 INFO sqlalchemy.engine.base.Engine.0x...7a90 {'data_1': u'dr\xf4le de petit voix m\u2019a r\xe9veill\xe9.'}
    
  3. jek

    I've seen this too, specifically against PG databases with SQL_ASCII encoding, a special unprocessed PG encoding that can store any bytestring encodings.

    It seems like previously the conversion to bytestrings honored the charset= setting on the engine, and now it's falling down to the encoding the cursor is picking up from the db.

  4. Former user Account Deleted

    psycopg2 handles Python unicode objects natively, and if for some reason your psycopg2 isn't, this is detected upon first connect.

    I tried multiple psycopg2 versions, non works.

    Note that its also good idea to use the Unicode type, not the String type, for columns that expect to handle unicode - that way you don't need a per-engine setting.

    I replaced String with Unicode for testing, same issue.

    Note also need to be using a PG database configured for utf-8.

    The schemas created by SQLAlchemy are SQL_ASCII, all databases on that system have this encoding set.

  5. Mike Bayer repo owner

    Then turn the native_unicode setting off on your create_engine. It's in the docs and should also be in that migration entry.

  6. Former user Account Deleted

    If that setting would work, I would do that. But the documented setting gives me a "not a valid option". Furthermore this requires modification of every person using postgres with SQLAlchemy. How would a setup have to look like that the current solution works with unicode?

  7. Mike Bayer repo owner

    Replying to guest:

    If that setting would work, I would do that. But the documented setting gives me a "not a valid option".

    Sorry, I can't reproduce that. I set the "SQL_ASCII" client setting of my PG database to reproduce the original error successfully (FYI you should be able to set your client encoding to utf-8 on any existing database), and the setting works as advertised:

    #!coding: utf-8
    
    from sqlalchemy import *
    
    engine = create_engine(
                    'postgresql://scott:tiger@localhost/test',
                    echo=True, convert_unicode=True)
    
    engine2 = create_engine(
                    'postgresql://scott:tiger@localhost/test',echo=True, 
                    convert_unicode=True, use_native_unicode=False)
    
    stmt = text("select :foo as foo", 
                    bindparams=[type_=Unicode)](bindparam('foo',), 
                    typemap={'foo':Unicode})
    
    data = u'drôle de petit voix m’a réveillé.'
    
    try:
        engine.execute(stmt, foo=data)
    except UnicodeEncodeError:
        print "raised error as expected"
    
    engine2.execute(stmt, foo=data).scalar()
    assert engine2.execute(stmt, foo=data).scalar() == data
    

    Furthermore this requires modification of every person using postgres with SQLAlchemy.

    In the first case, most people have their PG databases set up with the encoding they'd like to use. I'm fairly certain that the Django ORM only works with PG unicode in this mode (i.e. they don't do any unicode conversion of their own), so that's a very large installed base who aren't using multibyte characters with SQL_ASCII client encoding.

    In the second case, SQLAlchemy major number releases (i.e. 0.X -> 0.Y) always feature enhancements and improvements with some backwards compatibility. This is why the 06Migration document exists and why it should be consulted carefully before assuming a bug.

    Thirdly, those who are using SQL_ASCII only need add this one flag to their create_engine() to get the old behavior.

    How would a setup have to look like that the current solution works with unicode?

    see the above test script

  8. Mike Bayer repo owner

    note that the reason I said "use the native_unicode flag ; its in the docs and in the migration document" are because i was not able to check the exact name of the option during that reply and was suggesting that those documents be consulted. it is documented in both of those sources as "use_native_unicode".

  9. Mike Bayer repo owner

    assuming after our twitter convos you got this working. There is the issue that create_engine() accepts two kinds of arguments, those for the DBAPI and those for the dialect, and only the former is accepted as part of the URL. It would be nice to figure out a fix for that but thats a separate issue from this.

  10. Log in to comment