rework of unicode conversion, re: "conditional" as well as cx_oracle

Issue #2911 resolved
Mike Bayer repo owner created an issue

working on removing the cx_oracle outputtype handler for "unicode" as it adds overhead to the non-unicode use case. would like to only do unicode conversion on oracle when unicode is requested; this requires speeding up the "conditional" check.

Comments (4)

  1. Mike Bayer reporter

    working with this script and changing things:

    from sqlalchemy.testing.profiling import profiled
    from sqlalchemy import create_engine
    
    engine = create_engine('oracle://scott:tiger@localhost/xe')
    
    try:
        engine.execute("DROP TABLE test_stuff")
    except:
        pass
    
    #engine.execute("CREATE TABLE test_stuff (data nvarchar2(30))")
    engine.execute("CREATE TABLE test_stuff (data varchar2(30))")
    
    engine.execute("INSERT INTO test_stuff (data) values (:data)",
            ["d%d" % i} for i in range(50000)]({"data":))
    
    
    from sqlalchemy.sql import select, column, cast
    from sqlalchemy import Unicode, String
    
    with engine.connect() as conn:
        @profiled()
        def go():
            #result = conn.execute("select * from test_stuff")
            result = conn.execute(select([Unicode), Unicode(30)).label('data')](cast(column('data',)).select_from("test_stuff"))
            #result = conn.execute(select([String)](column('data',)).select_from("test_stuff"))
            row = result.fetchone()
            assert isinstance(row['data']('data'), unicode)
            for row in result.fetchall():
                x = row['data']('data')
    
    
        go()
    
  2. Mike Bayer reporter

    upcoming is a new to_conditional_unicode_processor_factory() in both Python and C. modify cx_oracle as such:

    @@ -749,8 +752,9 @@ class OracleDialect_cx_oracle(OracleDialect):
                                 outconverter=self._detect_decimal,
                                 arraysize=cursor.arraysize)
                 # allow all strings to come back natively as Unicode
    -            elif defaultType in (cx_Oracle.STRING, cx_Oracle.FIXED_CHAR):
    -                return cursor.var(util.text_type, size, cursor.arraysize)
    +            #elif defaultType in (cx_Oracle.STRING, cx_Oracle.FIXED_CHAR):
    +            #    return cursor.var(util.text_type, size, cursor.arraysize)
    

    cx_oracle then returns bytes or unicode (py2K only) depending on the column type (CHAR or NVARCHAR, etc.). In this case we seek to do "conditional" unicode returns, since we don't know when the user might be placing Unicode() around a CHAR or NVARCHAR expression. conditional unicode returns are expensive since they require an isinstance().

    But when we have cx_oracle's converter in place, now we have the unicode conversion overhead for all strings, not just unicode. For whatever reason, cx_oracle on Py2K counts all the decodes as Python function calls; in Py3K it does not, even if you have that converter in place. So there's some less than ideal shenanigans going on inside of cx_oracle making us look bad.

    If we standardize cx_oracle instead on "conditional", we pay a price for unicode conversion when the C extensions are not in place; however, when the C extensions are present, the new one that does "conditional" does the check without any fn call overhead. results are as follows:

    1. cx_oracle unicode, no C ext, no check, returning unicode - 200K
    
    2. no cx_oracle unicode, no C ext, conditional check, returning unicode - 300K
    
    3. no cx_oracle unicode, no C ext, unconditional check, returning unicode - 250K
    
    4. cx_oracle unicode, no C ext, returning str - 200K
    
    5. no cx_oracle unicode, no C ext, returning str - 100K
    
    6. cx_oracle unicode, C ext, no check, returning unicode - 100K
    
    7. no cx_oracle unicode, C ext, conditional check, returning unicode - 254
    
    8. no cx_oracle unicode, C ext, unconditional check, returning unicode - 254
    
    9. cx_oracle unicode, C ext, returning str - 100K
    
    10. no cx_oracle unicode, C ext, returning str - 236
    
  3. Log in to comment