rework of unicode conversion, re: "conditional" as well as cx_oracle
working on removing the cx_oracle outputtype handler for "unicode" as it adds overhead to the non-unicode use case. would like to only do unicode conversion on oracle when unicode is requested; this requires speeding up the "conditional" check.
Comments (4)
-
reporter -
reporter upcoming is a new
to_conditional_unicode_processor_factory()
in both Python and C. modify cx_oracle as such:@@ -749,8 +752,9 @@ class OracleDialect_cx_oracle(OracleDialect): outconverter=self._detect_decimal, arraysize=cursor.arraysize) # allow all strings to come back natively as Unicode - elif defaultType in (cx_Oracle.STRING, cx_Oracle.FIXED_CHAR): - return cursor.var(util.text_type, size, cursor.arraysize) + #elif defaultType in (cx_Oracle.STRING, cx_Oracle.FIXED_CHAR): + # return cursor.var(util.text_type, size, cursor.arraysize)
cx_oracle then returns bytes or unicode (py2K only) depending on the column type (CHAR or NVARCHAR, etc.). In this case we seek to do "conditional" unicode returns, since we don't know when the user might be placing Unicode() around a CHAR or NVARCHAR expression. conditional unicode returns are expensive since they require an isinstance().
But when we have cx_oracle's converter in place, now we have the unicode conversion overhead for all strings, not just unicode. For whatever reason, cx_oracle on Py2K counts all the decodes as Python function calls; in Py3K it does not, even if you have that converter in place. So there's some less than ideal shenanigans going on inside of cx_oracle making us look bad.
If we standardize cx_oracle instead on "conditional", we pay a price for unicode conversion when the C extensions are not in place; however, when the C extensions are present, the new one that does "conditional" does the check without any fn call overhead. results are as follows:
1. cx_oracle unicode, no C ext, no check, returning unicode - 200K 2. no cx_oracle unicode, no C ext, conditional check, returning unicode - 300K 3. no cx_oracle unicode, no C ext, unconditional check, returning unicode - 250K 4. cx_oracle unicode, no C ext, returning str - 200K 5. no cx_oracle unicode, no C ext, returning str - 100K 6. cx_oracle unicode, C ext, no check, returning unicode - 100K 7. no cx_oracle unicode, C ext, conditional check, returning unicode - 254 8. no cx_oracle unicode, C ext, unconditional check, returning unicode - 254 9. cx_oracle unicode, C ext, returning str - 100K 10. no cx_oracle unicode, C ext, returning str - 236
-
reporter -
reporter - changed milestone to 1.0.xx
- Log in to comment
working with this script and changing things: