oracle 8 unicode support
waiting on kb for information if this patch works with oracle 8
Comments (25)
-
Account Deleted -
reporter new patch attached. needs testing.
-
Account Deleted Now metadata.create_all() appears to work nicely. I still have problems in my user code where I've specified a unicode string for a bind param. For Oracle 9 or Postgresql, these bind parameters are converted nicely for me. Can we make them auto-convert for Oracle 8 or do I need to avoid using unicode in sqla land?
(We considered having the ability to support unicode for Postgres and still have the same code function in Oracle 8, though we will likely abandon this idea. This means I ''could'' and probably ''will'' go through all our code and change away from unicode, but it seems it would be nice for this to automatically convert unicode bind params on Oracle 8, agreed?)
sqlalchemy.exc.DatabaseError: (DatabaseError) ORA-12704: character set mismatch 'INSERT INTO systemparameters (name, type, stringval, numberval, dateval) VALUES (:name, :type, :stringval, :numberval, :dateval)' {'stringval': u'L', 'type': u's', 'name': u'invcost_fifo_lifo', 'dateval': None, 'numberval': None}
-
reporter if those are String-typed, and you're using at least text() or any SQL construct to define the statement, they will be encoded from unicode to string. Otherwise I'm not sure how a unicode object is getting sent through as a bind param with my most recent patch, you'd have to illustrate an example.
-
Account Deleted Would you expect this to automatically convert from unicode?
from sqlalchemy import * from sqlalchemy.orm import * engine = create_engine('oracle://usr:pass@1.2.3.4:1521/live',echo=True) engine.connect() print "*****************************" print "server_version_info: ", engine.dialect.server_version_info print "_supports_char_length: ", engine.dialect._supports_char_length print "supports_unicode_binds: ", engine.dialect.supports_unicode_binds print "use_ansi:", engine.dialect.use_ansi print "*****************************" metadata = MetaData() Session = sessionmaker(bind=engine) DBSession = Session() table = Table("tabl", metadata, Column("col", String(255), primary_key=True) ) class Record(object): pass mapper(Record, table) r=Record() r.col = u'Unicode String' metadata.create_all(engine) DBSession.add(r) DBSession.flush()
Here is the output:
2010-05-25 04:55:21,725 INFO sqlalchemy.engine.base.Engine.0x...ce90 SELECT USER FROM DUAL 2010-05-25 04:55:21,729 INFO sqlalchemy.engine.base.Engine.0x...ce90 {} ***************************** server_version_info: (8, 1, 7, 4, 0) _supports_char_length: False supports_unicode_binds: False use_ansi: False ***************************** 2010-05-25 04:55:23,981 INFO sqlalchemy.engine.base.Engine.0x...ce90 SELECT table_name FROM all_tables WHERE table_name = :name AND owner = :schema_name 2010-05-25 04:55:23,981 INFO sqlalchemy.engine.base.Engine.0x...ce90 {'name': 'TABL', 'schema_name': 'ARC'} 2010-05-25 04:55:24,098 INFO sqlalchemy.engine.base.Engine.0x...ce90 CREATE TABLE tabl ( col VARCHAR(255) NOT NULL, PRIMARY KEY (col) ) 2010-05-25 04:55:24,099 INFO sqlalchemy.engine.base.Engine.0x...ce90 {} 2010-05-25 04:55:24,237 INFO sqlalchemy.engine.base.Engine.0x...ce90 COMMIT 2010-05-25 04:55:24,429 INFO sqlalchemy.engine.base.Engine.0x...ce90 BEGIN 2010-05-25 04:55:24,429 INFO sqlalchemy.engine.base.Engine.0x...ce90 INSERT INTO tabl (col) VALUES (:col) 2010-05-25 04:55:24,429 INFO sqlalchemy.engine.base.Engine.0x...ce90 {'col': u'Unicode String'} 2010-05-25 04:55:24,629 INFO sqlalchemy.engine.base.Engine.0x...ce90 ROLLBACK Traceback (most recent call last): File "ora8.py", line 32, in <module> DBSession.flush() File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/orm/session.py", line 1348, in flush self._flush(objects) File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/orm/session.py", line 1429, in _flush flush_context.execute() File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/orm/unitofwork.py", line 293, in execute rec.execute(self) File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/orm/unitofwork.py", line 426, in execute uow File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/orm/mapper.py", line 1559, in _save_obj c = connection.execute(statement.values(value_params), params) File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1109, in execute return Connection.executors[c](c)(self, object, multiparams, params) File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1186, in _execute_clauseelement return self.__execute_context(context) File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1215, in __execute_context context.parameters[0](0), context=context) File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1284, in _cursor_execute self._handle_dbapi_exception(e, statement, parameters, cursor, context) File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1282, in _cursor_execute self.dialect.do_execute(cursor, statement, parameters, context=context) File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/engine/default.py", line 282, in do_execute cursor.execute(statement, parameters) sqlalchemy.exc.DatabaseError: (DatabaseError) ORA-12704: character set mismatch 'INSERT INTO tabl (col) VALUES (:col)' {'col': u'Unicode String'}
-
Account Deleted By the way, if I change String(255) to Unicode(255) then this works fine. It used to be that all the strings were Unicode, and I recently switch them to String when upgrading to 0.6.0 because I didn't want to use NVARCHAR where the existing columns were VARCHAR.
-
Account Deleted Another note: the exact same script that fails on Oracle 8 succeeds on higher versions of Oracle:
***************************** server_version_info: (10, 2, 0, 1, 0) _supports_char_length: True supports_unicode_binds: True use_ansi: True ***************************** 2010-05-16 12:23:15,701 INFO sqlalchemy.engine.base.Engine.0x...7410 SELECT table_name FROM all_tables WHERE table_name = :name AND owner = :schema_name 2010-05-16 12:23:15,701 INFO sqlalchemy.engine.base.Engine.0x...7410 {'name': u'TABL', 'schema_name': u'ARC'} 2010-05-16 12:23:15,706 INFO sqlalchemy.engine.base.Engine.0x...7410 CREATE TABLE tabl ( col VARCHAR(255 CHAR) NOT NULL, PRIMARY KEY (col) ) 2010-05-16 12:23:15,706 INFO sqlalchemy.engine.base.Engine.0x...7410 {} 2010-05-16 12:23:15,901 INFO sqlalchemy.engine.base.Engine.0x...7410 COMMIT 2010-05-16 12:23:15,902 INFO sqlalchemy.engine.base.Engine.0x...7410 BEGIN 2010-05-16 12:23:15,903 INFO sqlalchemy.engine.base.Engine.0x...7410 INSERT INTO tabl (col) VALUES (:col) 2010-05-16 12:23:15,903 INFO sqlalchemy.engine.base.Engine.0x...7410 {'col': u'Unicode String'}
So, it seems inconsistent somewhere....
-
reporter Replying to guest:
Would you expect this to automatically convert from unicode?
OK, so here you need either the Unicode() type (or similarly convert_unicode=True with String), or as a shortcut you can add "convert_unicode=True" to create_engine() which will turn on the unicode->utf-8 coercion for all String types.
If that solves the issue then we can commit this patch, which is the main thing holding up the 0.6.1 release at this point (shooting for this weekend).
-
Account Deleted Replying to zzzeek:
Replying to guest:
Would you expect this to automatically convert from unicode?
OK, so here you need either the Unicode() type (or similarly convert_unicode=True with String), or as a shortcut you can add "convert_unicode=True" to create_engine() which will turn on the unicode->utf-8 coercion for all String types.
If that solves the issue then we can commit this patch, which is the main thing holding up the 0.6.1 release at this point (shooting for this weekend).
Yes, when I checked with Unicode(), it apparently works, so the patch is good.
I am still curious, however: why do I need to use Unicode() for it to work with Oracle 8, but for Oracle 10, I don't need that - unicode strings are automatically converted even though I use String() ?
-
reporter the default behavior of
String
is to do nothing with bind parameters. the value is passed from your code straight to cx_oracle. (I had somewhat forgotten this detail in my response of yesterday). cx_oracle 5 allows Python unicodes as bind parameters directly, but apparently not when Oracle 8 is in use (even if the object is just ASCII, seems to be what you're observing).The usage of
Unicode()
,String(convert_unicode=True)
orcreate_engine(..., convert_unicode=True)
means theString
type will intercept theu'xxx'
string and encode it first to the dialect encoding (defaults to utf-8).I'm still a little suspicious that your Oracle 8 install might be potentially configurable to work here, but its not worth it if turning on the "encode to string" flag allows things to work anyway.
-
Account Deleted The usage of
Unicode()
,String(convert_unicode=True)
orcreate_engine(..., convert_unicode=True)
means theString
type will intercept theu'xxx'
string and encode it first to the dialect encoding (defaults to utf-8).Suppose I want the same table definition to work whether the connection string is Postgres, Oracle 8 or Oracle 10. Then, if I use Unicode() instead, that is great for Postgres. But if I am connecting to Oracle 8 (or 10) and the underlying data is still stored as ASCII, will that incur pointless string processing overhead to convert from ASCII to unicode for Python and then back to ASCII? It seems this would be the case for Unicode() or String(convert_unicode=True). What about setting convert_unicode=True. Does it only convert if it needs to or will it automatically always convert incoming data from ASCII to unicode for python and then back from unicode to ASCII?
I'm still a little suspicious that your Oracle 8 install might be potentially configurable to work here, but its not worth it if turning on the "encode to string" flag allows things to work anyway.
Unfortunately, we won't have control over this legacy database (in other words, I can't change the Oracle setup to see if it is configurable, even if I knew what to change as it may impact the legacy software using the database)...
-
reporter You really should use
Unicode
if you're sending in Python unicode objects (i.e. u'xxx'). The "pointless overhead" on the SQLA side is only when you use Oracle 8, which is the only scenario where SQLA is doing the encoding itself. In the postgresql and cx_oracle + oracle 10 scenarios, the driver is doing the unicode conversion since they accept Python unicode objects directly. SQLA 0.6 only does something when the underlying driver can't.As it turns out, in all of the above cases, native encoders (assuming you use the SQLA c extensions) are doing the work here. Overhead is minimal and should take a backseat to cross-compatibility. The "overhead" associated with bind parameters is typically minimal, its result set processing that is usually where most of the overhead lies.
If you stick with String, the cx_oracle and PG drivers are by still doing the conversion to Python unicode anyway in result sets. The pg dialect has an option to turn this off but the cx_oracle one at the moment does not.
When SQLalchemy does the unicode conversion itself, it only encodes Python unicode objects and only decodes Python string objects.
-
Account Deleted One more thought on this: As part of the patch, I'd consider adding
self.convert_unicode = True
since apparently cx_Oracle doesn't convert the unicode for oracle under version 8:
def initialize(self, connection): super(OracleDialect_cx_oracle, self).initialize(connection) if self.server_version_info < (9, ): self.supports_unicode_binds = False self.convert_unicode = True
Thoughts on that? (Or are you under the impression that there are ways of configuring Oracle 8 for some of these features, which I don't have set up...?)
-
reporter convert_unicode is an end user setting. if you don't have it set, it means you want SQLA to have no interaction with your strings.
-
reporter that is, no interaction with strings that aren't otherwise individually set to accept unicode. the setting should be able to handle being left on True safely.
-
Account Deleted I'm asking myself ''what changed since 0.5.8?'' There were none of these issues in 0.5.8, so it must not be a cx_Oracle difference between 8 and 9i ? Or did sqla change, now deferring unicode conversion to the driver?
-
reporter this is all at 06Migration#NativeUnicodeMode
-
Account Deleted I assume you would support this change to go along with the other changes in your patch?
--- base.py.changed 2010-05-26 16:17:23.000000000 -0400 +++ base.py 2010-05-26 16:16:06.000000000 -0400 @@ -301,7 +301,10 @@ return self.visit_CLOB(type_) def visit_unicode_text(self, type_): - return self.visit_NCLOB(type_) + if self.dialect._supports_nchar: + return self.visit_NCLOB(type_) + else: + return self.visit_CLOB(type_) def visit_large_binary(self, type_): return self.visit_BLOB(type_)
-
reporter ah yes that applies too, thanks.
-
reporter - changed status to resolved
cleaned it up a bit in b086f9a81556250ac6352e092a36e53757f36477. While I added mocked up tests, let me know that it still works on your end with a real ORA 8 database.
-
Account Deleted Are you asking me to verify the codeset still works now on ORA 8 or that your tests work?
-
Account Deleted -
reporter um, try out the tip, if its broke, reopen this ticket.
-
Account Deleted Very good. Seems to still check out.
-
reporter - removed milestone
Removing milestone: 0.6.1 (automated comment)
- Log in to comment
patch allows oracle connection to succeed.
However, now I get problems during the metadata.create_all() statement. SQLA supplies unicode bind parameters to the select statement instead of converting them to plain ansi characters:
Traceback: