Binding strings in Oracle is very slow

Issue #4035 resolved

Anthony Tuininga created an issue 2017-07-28

See this issue that was raised in cx_Oracle: https://github.com/oracle/python-cx_Oracle/issues/58 and the analysis there. It also contains the test case that demonstrates the problem. Let me know if you require any additional information.

Comments (7)

Mike Bayer repo owner
So ~~#3937~~ is where we got hit with this, because on my end I could find no other way of detecting if cx_Oracle was built with the ancient "WITH_UNICODE" flag unless I noticed that cx_Oracle.UNICODE was missing. Fast forward a few years to 5.3 where you removed cx_Oracle.UNICODE entirely, and everyone's SQLAlchemy broke. So apparently I read that as, "unicode mode is switched on unconditionally", when really it meant, "there is no more unicode mode" :). which is better !

The flag for WITH_UNICODE is now blocked if cx_Oracle version >= (5, 3) in https://gerrit.sqlalchemy.org/461.

in particular if you can confirm this comment:
```
        # cx_Oracle WITH_UNICODE mode.  *only* python
        # unicode objects accepted for anything.  This
        # mode of operation is implicit for Python 3,
        # however under Python 2 it existed as a never-used build-time
        # option for cx_Oracle 5.0 - 5.2 and was removed as of 5.3.
```
Basically when 5.3 is present, we go in full "python 2" operation like WITH_UNICODE never existed.
- 2017-07-28T19:08:04+00:00
Mike Bayer repo owner
- changed milestone to 1.0.xx
- changed component to oracle
- 2017-07-28T19:08:25+00:00
Anthony Tuininga reporter
Support for "UNICODE mode" was removed in 5.1 (released March 2011) so I think you can safely assume that it doesn't exist any longer! So it only existed for cx_Oracle 5.0. You could simply add a check for the version being >= 5.1 and raise an exception indicating that anything earlier than that isn't supported any longer. If you don't want to do that, you can also explicitly check that the version is earlier than 5.1 before checking for the UNICODE attribute.

The main issue (which unfortunately can no longer be changed) is that when binding Unicode strings in Python 2.x the driver assumes that you want to bind as NVARCHAR2. This ordinarily doesn't cause a problem but can when using indexes (like noted in the cx_Oracle issue) or attempting to use PL/SQL in certain cases (arrays and overloaded procedures). You can override this by using cursor.setinputsizes() to specify that you want VARCHAR2 instead. You could also use an inputtypehandler to override this behavior.

In contrast, Python 3.x uses Unicode strings all of the time so the default is always VARCHAR2 and you have to explicitly specify you want to use NVARCHAR2 by using cursor.setinputsizes() or an inputtypehandler. And yes, the same issues can occur if you have an index on an NVARCHAR2 column -- unless you bind as NVARCHAR2 the index won't get used.

The old "Unicode mode" made sure that strings were always returned as Unicode as well. But that is more effectively accomplished by using an outputtypehandler. You can even build a subclass of Connection that will do this for you automatically, if desired. I can provide help with that, if that is desirable.

Looking at the code in sqlalchemy, supports_unicode_statements should always be True. supports_unicode_binds should always be True. I'm not sure what the OracleExecutionContext_cx_oracle class is intended to do but hopefully the above information will help? If not, let me know what needs more explanation and I'll try to provide it!
- 2017-07-28T19:52:04+00:00
Mike Bayer repo owner
So we do setinputsizes() but specifically not for STRING as it caused problems (the comments claim they were with RETURNING and OUT parameters), have not looked at those problems in some years but we are able to round-trip Python unicode values with non-ascii strings into columns and out again in both Python 2 and 3, so at least as far as getting the characters over the wire, the VARCHAR2 you refer to seems to still work under Python 3. As far as the indexes not matching up, I believe we might have had some users with that issue who might have wound up using CAST() in the statement instead, it seems like these rules may need to be updated so that we do set up setinputsizes() on newer cx_Oracle versions with Python 3.

as for outputtypehandler, we totally used that but we had major complaints about performance as there was no way to limit the conversion taking place only where needed, and in ~~#2911~~ we switched to using our own strategy. in the comment at https://bitbucket.org/zzzeek/sqlalchemy/issues/2911#comment-9003374, you can see some call counts logged by Python cprofile - where you see "200K", that means the test script took 200000 Python function calls, and where you see "254", that's not a typo, that's the same operation taking only 254 Python function calls using our own system (still doing the unicode conversion, just keeping the call overhead within C code).
- 2017-07-28T21:21:52+00:00
Mike Bayer repo owner
- changed status to resolved
Revert cx_Oracle WITH_UNICODE change under > 5.0

Fixed performance regression caused by the fix for 3937 where cx_Oracle as of version 5.3 dropped the .UNICODE symbol from its namespace, which was interpreted as cx_Oracle's "WITH_UNICODE" mode being turned on unconditionally, which invokes functions on the SQLAlchemy side which convert all strings to unicode unconditionally and causing a performance impact. In fact, per cx_Oracle's author the "WITH_UNICODE" mode has been removed entirely as of 5.1, so the expensive unicode conversion functions are no longer necessary and are disabled if cx_Oracle 5.1 or greater is detected under Python 2. The warning against "WITH_UNICODE" mode that was removed under 3937 is also restored.

Change-Id: Iddd38d81a5adb27c953a5ee2eae5529a21da16e1 Fixes: ~~#4035~~

→ <<cset 7997d7fdc363>>
- 2017-07-28T22:06:40+00:00
Mike Bayer repo owner
Revert cx_Oracle WITH_UNICODE change under > 5.0

Fixed performance regression caused by the fix for 3937 where cx_Oracle as of version 5.3 dropped the .UNICODE symbol from its namespace, which was interpreted as cx_Oracle's "WITH_UNICODE" mode being turned on unconditionally, which invokes functions on the SQLAlchemy side which convert all strings to unicode unconditionally and causing a performance impact. In fact, per cx_Oracle's author the "WITH_UNICODE" mode has been removed entirely as of 5.1, so the expensive unicode conversion functions are no longer necessary and are disabled if cx_Oracle 5.1 or greater is detected under Python 2. The warning against "WITH_UNICODE" mode that was removed under 3937 is also restored.

Change-Id: Iddd38d81a5adb27c953a5ee2eae5529a21da16e1 Fixes: ~~#4035~~ (cherry picked from commit 7997d7fdc3634e7dba9fd0113b8b85ef311bfeaa)

→ <<cset 9689bb85f1a9>>
- 2017-07-28T22:07:00+00:00
Mike Bayer repo owner
Revert cx_Oracle WITH_UNICODE change under > 5.0

Fixed performance regression caused by the fix for 3937 where cx_Oracle as of version 5.3 dropped the .UNICODE symbol from its namespace, which was interpreted as cx_Oracle's "WITH_UNICODE" mode being turned on unconditionally, which invokes functions on the SQLAlchemy side which convert all strings to unicode unconditionally and causing a performance impact. In fact, per cx_Oracle's author the "WITH_UNICODE" mode has been removed entirely as of 5.1, so the expensive unicode conversion functions are no longer necessary and are disabled if cx_Oracle 5.1 or greater is detected under Python 2. The warning against "WITH_UNICODE" mode that was removed under 3937 is also restored.

Change-Id: Iddd38d81a5adb27c953a5ee2eae5529a21da16e1 Fixes: ~~#4035~~ (cherry picked from commit 7997d7fdc3634e7dba9fd0113b8b85ef311bfeaa)

→ <<cset edeed24d8727>>
- 2017-07-28T22:07:30+00:00
Log in to comment

Assignee: –

Type: bug

Priority: major

Status: resolved

Component: oracle

Milestone: 1.0.xx

Votes: 0

Watchers: 1