oracle 8 unicode support

Former user Account Deleted

patch allows oracle connection to succeed.

However, now I get problems during the metadata.create_all() statement. SQLA supplies unicode bind parameters to the select statement instead of converting them to plain ansi characters:

Traceback:

Traceback (most recent call last):
  File "/home/rarch/tg2env/bin/paster", line 8, in <module>
    load_entry_point('PasteScript==1.7.3', 'console_scripts', 'paster')()
  File "/home/rarch/tg2env/lib/python2.6/site-packages/PasteScript-1.7.3-py2.6.egg/paste/script/command.py", line 84, in run
    invoke(command, command_name, options, args[1:](1:))
  File "/home/rarch/tg2env/lib/python2.6/site-packages/PasteScript-1.7.3-py2.6.egg/paste/script/command.py", line 123, in invoke
    exit_code = runner.run(args)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/PasteScript-1.7.3-py2.6.egg/paste/script/appinstall.py", line 68, in run
    return super(AbstractInstallCommand, self).run(new_args)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/PasteScript-1.7.3-py2.6.egg/paste/script/command.py", line 218, in run
    result = self.command()
  File "/home/rarch/tg2env/lib/python2.6/site-packages/PasteScript-1.7.3-py2.6.egg/paste/script/appinstall.py", line 456, in command
    self, config_file, section, self.sysconfig_install_vars(installer))
  File "/home/rarch/tg2env/lib/python2.6/site-packages/PasteScript-1.7.3-py2.6.egg/paste/script/appinstall.py", line 598, in setup_config
    mod.setup_app, command, filename, section, vars)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/PasteScript-1.7.3-py2.6.egg/paste/script/appinstall.py", line 612, in _call_setup_app
    func(command, conf, vars)
  File "/home/rarch/trunk/src/appserver/pylotengine/websetup.py", line 32, in setup_app
    model.metadata.create_all(bind=config['pylons.app_globals']('pylons.app_globals').sa_engine)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/schema.py", line 1958, in create_all
    bind.create(self, checkfirst=checkfirst, tables=tables)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1504, in create
    self._run_visitor(ddl.SchemaGenerator, entity, connection=connection, **kwargs)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1535, in _run_visitor
    visitorcallable(self.dialect, conn, **kwargs).traverse_single(element)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/sql/visitors.py", line 77, in traverse_single
    return meth(obj, **kw)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/engine/ddl.py", line 36, in visit_metadata
    collection = [for t in sql_util.sort_tables(tables) if self._can_create(t)](t)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/engine/ddl.py", line 29, in _can_create
    return not self.checkfirst or not self.dialect.has_table(self.connection, table.name, schema=table.schema)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/dialects/oracle/base.py", line 632, in has_table
    name=self.denormalize_name(table_name), schema_name=self.denormalize_name(schema))
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1109, in execute
    return Connection.executors[c](c)(self, object, multiparams, params)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1186, in _execute_clauseelement
    return self.__execute_context(context)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1215, in __execute_context
    context.parameters[0](0), context=context)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1284, in _cursor_execute
    self._handle_dbapi_exception(e, statement, parameters, cursor, context)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1282, in _cursor_execute
    self.dialect.do_execute(cursor, statement, parameters, context=context)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/engine/default.py", line 281, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.DatabaseError: (DatabaseError) ORA-12704: character set mismatch
 'SELECT table_name FROM all_tables WHERE table_name = :name AND owner = :schema_name' {'name': u'ORDERS_T', 'schema_name': u'ARC'}

2010-05-24T16:17:48+00:00

Mike Bayer reporter

new patch attached. needs testing.

2010-05-24T16:28:01+00:00

Former user Account Deleted

Now metadata.create_all() appears to work nicely. I still have problems in my user code where I've specified a unicode string for a bind param. For Oracle 9 or Postgresql, these bind parameters are converted nicely for me. Can we make them auto-convert for Oracle 8 or do I need to avoid using unicode in sqla land?

(We considered having the ability to support unicode for Postgres and still have the same code function in Oracle 8, though we will likely abandon this idea. This means I ''could'' and probably ''will'' go through all our code and change away from unicode, but it seems it would be nice for this to automatically convert unicode bind params on Oracle 8, agreed?)

sqlalchemy.exc.DatabaseError: (DatabaseError) ORA-12704: character set mismatch
 'INSERT INTO systemparameters (name, type, stringval, numberval, dateval) VALUES (:name, :type, :stringval, :numberval, :dateval)' {'stringval': u'L', 'type': u's', 'name': u'invcost_fifo_lifo', 'dateval': None, 'numberval': None}

2010-05-24T16:45:45+00:00

Mike Bayer reporter

if those are String-typed, and you're using at least text() or any SQL construct to define the statement, they will be encoded from unicode to string. Otherwise I'm not sure how a unicode object is getting sent through as a bind param with my most recent patch, you'd have to illustrate an example.

2010-05-24T17:05:31+00:00

Former user Account Deleted

Would you expect this to automatically convert from unicode?

from sqlalchemy import *
from sqlalchemy.orm import *

engine = create_engine('oracle://usr:pass@1.2.3.4:1521/live',echo=True)
engine.connect()
print "*****************************"
print "server_version_info: ", engine.dialect.server_version_info
print "_supports_char_length: ", engine.dialect._supports_char_length
print "supports_unicode_binds: ", engine.dialect.supports_unicode_binds
print "use_ansi:", engine.dialect.use_ansi
print "*****************************"

metadata = MetaData()
Session = sessionmaker(bind=engine)
DBSession = Session()

table = Table("tabl", metadata,
    Column("col", String(255), primary_key=True)
)

class Record(object):
    pass

mapper(Record, table)

r=Record()
r.col = u'Unicode String'

metadata.create_all(engine)
DBSession.add(r)
DBSession.flush()

Here is the output:

2010-05-25 04:55:21,725 INFO sqlalchemy.engine.base.Engine.0x...ce90 SELECT USER FROM DUAL
2010-05-25 04:55:21,729 INFO sqlalchemy.engine.base.Engine.0x...ce90 {}
*****************************
server_version_info:  (8, 1, 7, 4, 0)
_supports_char_length:  False
supports_unicode_binds:  False
use_ansi: False
*****************************
2010-05-25 04:55:23,981 INFO sqlalchemy.engine.base.Engine.0x...ce90 SELECT table_name FROM all_tables WHERE table_name = :name AND owner = :schema_name
2010-05-25 04:55:23,981 INFO sqlalchemy.engine.base.Engine.0x...ce90 {'name': 'TABL', 'schema_name': 'ARC'}
2010-05-25 04:55:24,098 INFO sqlalchemy.engine.base.Engine.0x...ce90
CREATE TABLE tabl (
        col VARCHAR(255) NOT NULL,
        PRIMARY KEY (col)
)


2010-05-25 04:55:24,099 INFO sqlalchemy.engine.base.Engine.0x...ce90 {}
2010-05-25 04:55:24,237 INFO sqlalchemy.engine.base.Engine.0x...ce90 COMMIT
2010-05-25 04:55:24,429 INFO sqlalchemy.engine.base.Engine.0x...ce90 BEGIN
2010-05-25 04:55:24,429 INFO sqlalchemy.engine.base.Engine.0x...ce90 INSERT INTO tabl (col) VALUES (:col)
2010-05-25 04:55:24,429 INFO sqlalchemy.engine.base.Engine.0x...ce90 {'col': u'Unicode String'}
2010-05-25 04:55:24,629 INFO sqlalchemy.engine.base.Engine.0x...ce90 ROLLBACK
Traceback (most recent call last):
  File "ora8.py", line 32, in <module>
    DBSession.flush()
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/orm/session.py", line 1348, in flush
    self._flush(objects)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/orm/session.py", line 1429, in _flush
    flush_context.execute()
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/orm/unitofwork.py", line 293, in execute
    rec.execute(self)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/orm/unitofwork.py", line 426, in execute
    uow
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/orm/mapper.py", line 1559, in _save_obj
    c = connection.execute(statement.values(value_params), params)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1109, in execute
    return Connection.executors[c](c)(self, object, multiparams, params)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1186, in _execute_clauseelement
    return self.__execute_context(context)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1215, in __execute_context
    context.parameters[0](0), context=context)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1284, in _cursor_execute
    self._handle_dbapi_exception(e, statement, parameters, cursor, context)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1282, in _cursor_execute
    self.dialect.do_execute(cursor, statement, parameters, context=context)
  File "/home/rarch/tg2env/lib/python2.6/site-packages/SQLAlchemy-0.6.0.2-py2.6-linux-x86_64.egg/sqlalchemy/engine/default.py", line 282, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.DatabaseError: (DatabaseError) ORA-12704: character set mismatch
 'INSERT INTO tabl (col) VALUES (:col)' {'col': u'Unicode String'}

2010-05-25T07:58:22+00:00

Former user Account Deleted

By the way, if I change String(255) to Unicode(255) then this works fine. It used to be that all the strings were Unicode, and I recently switch them to String when upgrading to 0.6.0 because I didn't want to use NVARCHAR where the existing columns were VARCHAR.

2010-05-25T08:02:00+00:00

Former user Account Deleted

Another note: the exact same script that fails on Oracle 8 succeeds on higher versions of Oracle:

*****************************
server_version_info:  (10, 2, 0, 1, 0)
_supports_char_length:  True
supports_unicode_binds:  True
use_ansi: True
*****************************
2010-05-16 12:23:15,701 INFO sqlalchemy.engine.base.Engine.0x...7410 SELECT table_name FROM all_tables WHERE table_name = :name AND owner = :schema_name
2010-05-16 12:23:15,701 INFO sqlalchemy.engine.base.Engine.0x...7410 {'name': u'TABL', 'schema_name': u'ARC'}
2010-05-16 12:23:15,706 INFO sqlalchemy.engine.base.Engine.0x...7410
CREATE TABLE tabl (
        col VARCHAR(255 CHAR) NOT NULL,
        PRIMARY KEY (col)
)


2010-05-16 12:23:15,706 INFO sqlalchemy.engine.base.Engine.0x...7410 {}
2010-05-16 12:23:15,901 INFO sqlalchemy.engine.base.Engine.0x...7410 COMMIT
2010-05-16 12:23:15,902 INFO sqlalchemy.engine.base.Engine.0x...7410 BEGIN
2010-05-16 12:23:15,903 INFO sqlalchemy.engine.base.Engine.0x...7410 INSERT INTO tabl (col) VALUES (:col)
2010-05-16 12:23:15,903 INFO sqlalchemy.engine.base.Engine.0x...7410 {'col': u'Unicode String'}

So, it seems inconsistent somewhere....

2010-05-25T08:17:18+00:00

Mike Bayer reporter

Replying to guest:

Would you expect this to automatically convert from unicode?

OK, so here you need either the Unicode() type (or similarly convert_unicode=True with String), or as a shortcut you can add "convert_unicode=True" to create_engine() which will turn on the unicode->utf-8 coercion for all String types.

If that solves the issue then we can commit this patch, which is the main thing holding up the 0.6.1 release at this point (shooting for this weekend).

2010-05-25T10:02:15+00:00

Former user Account Deleted

Replying to zzzeek:

Replying to guest:

Would you expect this to automatically convert from unicode?

OK, so here you need either the Unicode() type (or similarly convert_unicode=True with String), or as a shortcut you can add "convert_unicode=True" to create_engine() which will turn on the unicode->utf-8 coercion for all String types.

If that solves the issue then we can commit this patch, which is the main thing holding up the 0.6.1 release at this point (shooting for this weekend).

Yes, when I checked with Unicode(), it apparently works, so the patch is good.

I am still curious, however: why do I need to use Unicode() for it to work with Oracle 8, but for Oracle 10, I don't need that - unicode strings are automatically converted even though I use String() ?

2010-05-25T11:10:23+00:00

Mike Bayer reporter

the default behavior of String is to do nothing with bind parameters. the value is passed from your code straight to cx_oracle. (I had somewhat forgotten this detail in my response of yesterday). cx_oracle 5 allows Python unicodes as bind parameters directly, but apparently not when Oracle 8 is in use (even if the object is just ASCII, seems to be what you're observing).

The usage of Unicode(), String(convert_unicode=True) or create_engine(..., convert_unicode=True) means the String type will intercept the u'xxx' string and encode it first to the dialect encoding (defaults to utf-8).

I'm still a little suspicious that your Oracle 8 install might be potentially configurable to work here, but its not worth it if turning on the "encode to string" flag allows things to work anyway.

2010-05-25T11:55:40+00:00

Former user Account Deleted

The usage of Unicode(), String(convert_unicode=True) or create_engine(..., convert_unicode=True) means the String type will intercept the u'xxx' string and encode it first to the dialect encoding (defaults to utf-8).

Suppose I want the same table definition to work whether the connection string is Postgres, Oracle 8 or Oracle 10. Then, if I use Unicode() instead, that is great for Postgres. But if I am connecting to Oracle 8 (or 10) and the underlying data is still stored as ASCII, will that incur pointless string processing overhead to convert from ASCII to unicode for Python and then back to ASCII? It seems this would be the case for Unicode() or String(convert_unicode=True). What about setting convert_unicode=True. Does it only convert if it needs to or will it automatically always convert incoming data from ASCII to unicode for python and then back from unicode to ASCII?

I'm still a little suspicious that your Oracle 8 install might be potentially configurable to work here, but its not worth it if turning on the "encode to string" flag allows things to work anyway.

Unfortunately, we won't have control over this legacy database (in other words, I can't change the Oracle setup to see if it is configurable, even if I knew what to change as it may impact the legacy software using the database)...

2010-05-25T12:15:46+00:00

Mike Bayer reporter

You really should use Unicode if you're sending in Python unicode objects (i.e. u'xxx'). The "pointless overhead" on the SQLA side is only when you use Oracle 8, which is the only scenario where SQLA is doing the encoding itself. In the postgresql and cx_oracle + oracle 10 scenarios, the driver is doing the unicode conversion since they accept Python unicode objects directly. SQLA 0.6 only does something when the underlying driver can't.

As it turns out, in all of the above cases, native encoders (assuming you use the SQLA c extensions) are doing the work here. Overhead is minimal and should take a backseat to cross-compatibility. The "overhead" associated with bind parameters is typically minimal, its result set processing that is usually where most of the overhead lies.

If you stick with String, the cx_oracle and PG drivers are by still doing the conversion to Python unicode anyway in result sets. The pg dialect has an option to turn this off but the cx_oracle one at the moment does not.

When SQLalchemy does the unicode conversion itself, it only encodes Python unicode objects and only decodes Python string objects.

2010-05-25T13:04:29+00:00

Former user Account Deleted

One more thought on this: As part of the patch, I'd consider adding

self.convert_unicode = True

since apparently cx_Oracle doesn't convert the unicode for oracle under version 8:

    def initialize(self, connection): 
    super(OracleDialect_cx_oracle, self).initialize(connection) 
    if self.server_version_info < (9, ): 
        self.supports_unicode_binds = False 
        self.convert_unicode = True

Thoughts on that? (Or are you under the impression that there are ways of configuring Oracle 8 for some of these features, which I don't have set up...?)

2010-05-25T17:22:21+00:00

Mike Bayer reporter

convert_unicode is an end user setting. if you don't have it set, it means you want SQLA to have no interaction with your strings.

2010-05-25T19:47:08+00:00

Mike Bayer reporter

that is, no interaction with strings that aren't otherwise individually set to accept unicode. the setting should be able to handle being left on True safely.

2010-05-25T19:48:33+00:00

Former user Account Deleted

I'm asking myself ''what changed since 0.5.8?'' There were none of these issues in 0.5.8, so it must not be a cx_Oracle difference between 8 and 9i ? Or did sqla change, now deferring unicode conversion to the driver?

2010-05-26T14:02:15+00:00

Mike Bayer reporter

this is all at 06Migration#NativeUnicodeMode

2010-05-26T14:12:49+00:00

Former user Account Deleted

I assume you would support this change to go along with the other changes in your patch?

--- base.py.changed     2010-05-26 16:17:23.000000000 -0400
+++ base.py     2010-05-26 16:16:06.000000000 -0400
@@ -301,7 +301,10 @@
         return self.visit_CLOB(type_)

     def visit_unicode_text(self, type_):
-        return self.visit_NCLOB(type_)
+        if self.dialect._supports_nchar:
+            return self.visit_NCLOB(type_)
+        else:
+            return self.visit_CLOB(type_)

     def visit_large_binary(self, type_):
         return self.visit_BLOB(type_)

2010-05-26T16:21:08+00:00

Mike Bayer reporter

ah yes that applies too, thanks.

2010-05-26T16:34:27+00:00

Mike Bayer reporter

changed status to resolved

cleaned it up a bit in b086f9a81556250ac6352e092a36e53757f36477. While I added mocked up tests, let me know that it still works on your end with a real ORA 8 database.