Postgres no longer supports convert_unicode=True
Apparently with 0.6 I can no longer use unicode with postgres. Apparently the convert_unicode parameter to create_engine is ignored or something.
The following setup does not work: http://github.com/mitsuhiko/flask/blob/website/flask_website/database.py
Fails with UnicodeEncodeError when the session commits and object. Unicode strings are used throughout the application, bytestrings only if they are literals that are limited to ASCII.
Comments (11)
-
Account Deleted -
repo owner - marked as critical
psycopg2 handles Python unicode objects natively, and if for some reason your psycopg2 isn't, this is detected upon first connect. This is described at http://www.sqlalchemy.org/trac/wiki/06Migration#NativeUnicodeMode . The convert_unicode flag is still honored by the String type both from a local setting as well as per-dialect, but my test script below works without it just as well. Note that its also good idea to use the Unicode type, not the String type, for columns that expect to handle unicode - that way you don't need a per-engine setting. Note also need to be using a PG database configured for utf-8.
I'm using SQLA 0.6 in production for many months now with unicode + pg and so are others. Do you have a self-contained test script that actually runs ? Below is mine:
#!coding: utf-8 from sqlalchemy import * engine = create_engine('postgresql://scott:tiger@localhost/test',echo=True, convert_unicode=True) metadata = MetaData() foo = Table("foob", metadata, Column("data", String) ) metadata.drop_all(engine) metadata.create_all(engine) data = u'drôle de petit voix m’a réveillé.' engine.execute(foo.insert(), data=data) print engine.execute(foo.select().where(foo.c.data==data)).fetchall() assert engine.execute(foo.select().where(foo.c.data==data)).scalar() == data
output:
z-eeks-Computer-3:sqlalchemy classic$ python test.py 2010-05-03 14:06:22,881 INFO sqlalchemy.engine.base.Engine.0x...7a90 select version() 2010-05-03 14:06:22,881 INFO sqlalchemy.engine.base.Engine.0x...7a90 {} 2010-05-03 14:06:22,885 INFO sqlalchemy.engine.base.Engine.0x...7a90 select current_schema() 2010-05-03 14:06:22,885 INFO sqlalchemy.engine.base.Engine.0x...7a90 {} 2010-05-03 14:06:22,888 INFO sqlalchemy.engine.base.Engine.0x...7a90 select relname from pg_class c join pg_namespace n on n.oid=c.relnamespace where n.nspname=current_schema() and lower(relname)=%(name)s 2010-05-03 14:06:22,889 INFO sqlalchemy.engine.base.Engine.0x...7a90 {'name': u'foob'} 2010-05-03 14:06:22,893 INFO sqlalchemy.engine.base.Engine.0x...7a90 DROP TABLE foob 2010-05-03 14:06:22,893 INFO sqlalchemy.engine.base.Engine.0x...7a90 {} 2010-05-03 14:06:22,895 INFO sqlalchemy.engine.base.Engine.0x...7a90 COMMIT 2010-05-03 14:06:22,898 INFO sqlalchemy.engine.base.Engine.0x...7a90 select relname from pg_class c join pg_namespace n on n.oid=c.relnamespace where n.nspname=current_schema() and lower(relname)=%(name)s 2010-05-03 14:06:22,899 INFO sqlalchemy.engine.base.Engine.0x...7a90 {'name': u'foob'} 2010-05-03 14:06:22,900 INFO sqlalchemy.engine.base.Engine.0x...7a90 CREATE TABLE foob ( data VARCHAR ) 2010-05-03 14:06:22,901 INFO sqlalchemy.engine.base.Engine.0x...7a90 {} 2010-05-03 14:06:22,905 INFO sqlalchemy.engine.base.Engine.0x...7a90 COMMIT 2010-05-03 14:06:22,907 INFO sqlalchemy.engine.base.Engine.0x...7a90 INSERT INTO foob (data) VALUES (%(data)s) 2010-05-03 14:06:22,907 INFO sqlalchemy.engine.base.Engine.0x...7a90 {'data': u'dr\xf4le de petit voix m\u2019a r\xe9veill\xe9.'} 2010-05-03 14:06:22,908 INFO sqlalchemy.engine.base.Engine.0x...7a90 COMMIT 2010-05-03 14:06:22,910 INFO sqlalchemy.engine.base.Engine.0x...7a90 SELECT foob.data FROM foob WHERE foob.data = %(data_1)s 2010-05-03 14:06:22,910 INFO sqlalchemy.engine.base.Engine.0x...7a90 {'data_1': u'dr\xf4le de petit voix m\u2019a r\xe9veill\xe9.'} [de petit voix m\u2019a r\xe9veill\xe9.',)]((u'dr\xf4le) 2010-05-03 14:06:22,913 INFO sqlalchemy.engine.base.Engine.0x...7a90 SELECT foob.data FROM foob WHERE foob.data = %(data_1)s 2010-05-03 14:06:22,913 INFO sqlalchemy.engine.base.Engine.0x...7a90 {'data_1': u'dr\xf4le de petit voix m\u2019a r\xe9veill\xe9.'}
-
I've seen this too, specifically against PG databases with SQL_ASCII encoding, a special unprocessed PG encoding that can store any bytestring encodings.
It seems like previously the conversion to bytestrings honored the charset= setting on the engine, and now it's falling down to the encoding the cursor is picking up from the db.
-
Account Deleted psycopg2 handles Python unicode objects natively, and if for some reason your psycopg2 isn't, this is detected upon first connect.
I tried multiple psycopg2 versions, non works.
Note that its also good idea to use the Unicode type, not the String type, for columns that expect to handle unicode - that way you don't need a per-engine setting.
I replaced String with Unicode for testing, same issue.
Note also need to be using a PG database configured for utf-8.
The schemas created by SQLAlchemy are SQL_ASCII, all databases on that system have this encoding set.
-
repo owner Then turn the native_unicode setting off on your create_engine. It's in the docs and should also be in that migration entry.
-
Account Deleted If that setting would work, I would do that. But the documented setting gives me a "not a valid option". Furthermore this requires modification of every person using postgres with SQLAlchemy. How would a setup have to look like that the current solution works with unicode?
-
repo owner Replying to guest:
If that setting would work, I would do that. But the documented setting gives me a "not a valid option".
Sorry, I can't reproduce that. I set the "SQL_ASCII" client setting of my PG database to reproduce the original error successfully (FYI you should be able to set your client encoding to utf-8 on any existing database), and the setting works as advertised:
#!coding: utf-8 from sqlalchemy import * engine = create_engine( 'postgresql://scott:tiger@localhost/test', echo=True, convert_unicode=True) engine2 = create_engine( 'postgresql://scott:tiger@localhost/test',echo=True, convert_unicode=True, use_native_unicode=False) stmt = text("select :foo as foo", bindparams=[type_=Unicode)](bindparam('foo',), typemap={'foo':Unicode}) data = u'drôle de petit voix m’a réveillé.' try: engine.execute(stmt, foo=data) except UnicodeEncodeError: print "raised error as expected" engine2.execute(stmt, foo=data).scalar() assert engine2.execute(stmt, foo=data).scalar() == data
Furthermore this requires modification of every person using postgres with SQLAlchemy.
In the first case, most people have their PG databases set up with the encoding they'd like to use. I'm fairly certain that the Django ORM only works with PG unicode in this mode (i.e. they don't do any unicode conversion of their own), so that's a very large installed base who aren't using multibyte characters with SQL_ASCII client encoding.
In the second case, SQLAlchemy major number releases (i.e. 0.X -> 0.Y) always feature enhancements and improvements with some backwards compatibility. This is why the 06Migration document exists and why it should be consulted carefully before assuming a bug.
Thirdly, those who are using SQL_ASCII only need add this one flag to their create_engine() to get the old behavior.
How would a setup have to look like that the current solution works with unicode?
see the above test script
-
repo owner note that the reason I said "use the native_unicode flag ; its in the docs and in the migration document" are because i was not able to check the exact name of the option during that reply and was suggesting that those documents be consulted. it is documented in both of those sources as "use_native_unicode".
-
repo owner - changed status to wontfix
assuming after our twitter convos you got this working. There is the issue that
create_engine()
accepts two kinds of arguments, those for the DBAPI and those for the dialect, and only the former is accepted as part of the URL. It would be nice to figure out a fix for that but thats a separate issue from this. -
repo owner added some docs for this in 54820b2a64be35d79c927ce231986b0b1236f961
-
repo owner - removed milestone
Removing milestone: 0.6.1 (automated comment)
- Log in to comment
I just downgraded to 0.5.6 and it works, so I'm pretty confident now that this is something that broke new in 0.6. Probably some of the Python 3 unicode fixes cause that.