Oracle joinedload with limit breaks Decimal() cast

Issue #1840 resolved
Former user created an issue

The following script (also attached) demonstrates the cast to Decimal() instead of float is broken with a joinedload using a limit for Oracle:

from sqlalchemy import *
from sqlalchemy.orm import *

#engine = create_engine('postgresql://un:pw@localhost:5444/salespylot',echo=True)
engine = create_engine('oracle://un:pw@localhost:1521/xe?use_ansi=False',echo=True)
metadata = MetaData()
Session = sessionmaker(bind=engine)
session = Session()


orders_table = Table("orders", metadata,
    Column("orderid", Unicode(255), primary_key=True)
)


orderdetails_table = Table("orderdetails",metadata,
    Column("orderid", Unicode(255), ForeignKey('orders.orderid'), primary_key=True),
    Column("lineid", Integer, primary_key=True),
    Column("qtyordered", Numeric(20,2), nullable=False),
    Column("saleprice", Numeric(20,2), nullable=False),
)

class Order(object):
    pass

class OrderDetail(object):
    pass

order_mapper = mapper(Order, orders_table,
    properties={'orderdetails': relation(OrderDetail, 
                    cascade='all,delete-orphan', 
                    single_parent=True,
                    lazy=False,
                    backref=backref('parentorder',
                            cascade='refresh-expire,expunge'))
                })

od_alias=orderdetails_table.alias('od__a')
order_mapper.add_property('totalsale',
    # totalsale is an inline view column 
    column_property(cast(
        select([* od_alias.c.saleprice)](func.sum(od_alias.c.qtyordered),
            orders_table.c.orderid==od_alias.c.orderid
        ).as_scalar(), Numeric(20,2)).label('totalsale')))

mapper(OrderDetail, orderdetails_table, allow_partial_pks=False)

print "\n\n%r\n\n" % session.query(Order).all()[0](0).totalsale

print "\n\n%r" %  session.query(Order).limit(100).all()[0](0).totalsale

Creates this output:

2010-07-01 14:03:23,667 INFO sqlalchemy.engine.base.Engine.0x...43d0 SELECT USER FROM DUAL
2010-07-01 14:03:23,674 INFO sqlalchemy.engine.base.Engine.0x...43d0 {}
2010-07-01 14:03:23,678 INFO sqlalchemy.engine.base.Engine.0x...43d0 BEGIN
2010-07-01 14:03:23,679 INFO sqlalchemy.engine.base.Engine.0x...43d0 SELECT orders.orderid AS orders_orderid, CAST((SELECT sum(od__a.qtyordered * od__a.saleprice) AS sum_1
FROM orderdetails od__a
WHERE orders.orderid = od__a.orderid) AS NUMERIC(20, 2)) AS totalsale, orderdetails_1.orderid AS orderdetails_1_orderid, orderdetails_1.lineid AS orderdetails_1_lineid, orderdetails_1.qtyordered AS orderdetails_1_qtyordered, orderdetails_1.saleprice AS orderdetails_1_saleprice
FROM orders LEFT OUTER JOIN orderdetails orderdetails_1 ON orders.orderid = orderdetails_1.orderid
2010-07-01 14:03:23,679 INFO sqlalchemy.engine.base.Engine.0x...43d0 {}


Decimal('2499.15')


2010-07-01 14:03:23,705 INFO sqlalchemy.engine.base.Engine.0x...43d0 SELECT anon_1.orders_orderid AS anon_1_orders_orderid, anon_1.totalsale AS anon_1_totalsale, orderdetails_1.orderid AS orderdetails_1_orderid, orderdetails_1.lineid AS orderdetails_1_lineid, orderdetails_1.qtyordered AS orderdetails_1_qtyordered, orderdetails_1.saleprice AS orderdetails_1_saleprice
FROM (SELECT orders_orderid, totalsale
FROM (SELECT orders.orderid AS orders_orderid, CAST((SELECT sum(od__a.qtyordered * od__a.saleprice) AS sum_1
FROM orderdetails od__a
WHERE orders.orderid = od__a.orderid) AS NUMERIC(20, 2)) AS totalsale
FROM orders)
WHERE ROWNUM <= :ROWNUM_1) anon_1 LEFT OUTER JOIN orderdetails orderdetails_1 ON anon_1.orders_orderid = orderdetails_1.orderid
2010-07-01 14:03:23,705 INFO sqlalchemy.engine.base.Engine.0x...43d0 {'ROWNUM_1': 100}


2499.1500000000001

(Broken regardless of use_ansi setting.)

Works fine: * With postgres * With lazy=True instead of lazy=False for the joinedload * Without limit (as demonstrated)

Comments (20)

  1. Mike Bayer repo owner

    This is an Oracle issue. We can't fix it, but there are workarounds.

    Here is the Oracle bug:

    import cx_Oracle
    
    conn = cx_Oracle.connect(user='scott', password='tiger', dsn=cx_Oracle.makedsn('localhost', 1521, 'xe'))
    
    def output_type_handler(cursor, name, defaultType, size, precision, scale):
        print name, defaultType, precision, scale
    
    conn.outputtypehandler = output_type_handler
    
    cursor = conn.cursor()
    
    try:
        cursor.execute("""DROP TABLE foo""")
    except:
        pass
    
    cursor.execute("""
    CREATE TABLE foo (
        data NUMERIC(20, 2)
    )
    """)
    
    cursor.execute("INSERT INTO foo (data) VALUES (25)")
    
    cursor.execute("SELECT data FROM foo")
    
    print cursor.fetchall()
    
    cursor.execute("""
            SELECT 
                (SELECT CAST((SELECT data AS sum_1 FROM foo) AS NUMERIC(20, 2)) FROM DUAL) AS totalsale
            FROM dual
    """)
    
    print cursor.fetchall()
    

    AS you can see, the type information is lost in the second query.

    Two workarounds:

    1. turn off native decimal.

      #!python

      engine = create_engine('oracle://scott:tiger@localhost/xe',echo=True) engine.connect() engine.dialect.supports_native_decimal = False

    2. Don't use joinedload, use subqueryload.

      #!python

      order_mapper = mapper(Order, orders_table, properties={'orderdetails': relation(OrderDetail, cascade='all,delete-orphan', single_parent=True, lazy='subquery', backref=backref('parentorder', cascade='refresh-expire,expunge')) })

  2. Former user Account Deleted

    Turning off supports_native_decimal means sqlalchemy will figure out how to cast the results from floats to decimal?

    The warning I get is

    SAWarning: Dialect oracle+cx_oracle does *not* support Decimal objects natively, and 
    SQLAlchemy must convert from floating point - rounding errors and other issues may 
    occur. Please consider storing Decimal numbers as strings or integers on this platform 
    for lossless storage.
    

    My question is: Is this Warning accurate in this case? Is sqlalchemy really converting from floating point if supports_native_decimal is off, even if oracle is returning 2499.15? In other words, if I turn off supports_native_decimal does cx_Oracle return a float to sqlalchemy? (as opposed to a string, etc)

  3. Mike Bayer repo owner
    • changed milestone to 0.6.2

    OK, so two things.

    First of all, if you turn off supports_native_decimal, then yes, cx_oracle is returning a Python floating point number, like 15.756300000000001, which we convert to decimal using a "%f" string template. This process is known to be lossy for numbers with a large amount of significant digits.

    However, the good news is that, if I just change our oracle "output type handler", the thing that disables cx_oracle from sending us floats, to be more liberal about what we convert to Decimal(), that works absolutely fine as well - as the Float type in SQLAlchemy seems to be passing floats through our "as_float" processor unconditionally. If the SQLAlchemy type really wants a float, the conversion is oracle string->cx_oracle->Decimal->float(Decimal), which is accurate. It just needs to let integers through without being converted.

  4. Former user Account Deleted

    Very nice. (It would also work, when oracle, to put CAST ... AS NUMERIC(20,2) around the outer-most SELECT columns, whose type SQLAlchemy already knows. But your solution sounds like it is obviously the way to go.) Thanks

  5. Mike Bayer repo owner

    yes, if we copied out the CAST to the top level that would work too, but that would be exceedingly difficult to wrangle into the expression system, so I'm glad this other idea looks promising. It is in 25db56bc0c1ee30fc0ad9166ac48de7dc8328762 and please let me know if it solves your issue.

  6. Former user Account Deleted

    Our fix here caused an infinite loop in an unrelated portion of my code. I was expecting the return of a Sequence to give me an integer. In fact, I was doing integer math with the return:

    dec = DBSession.execute(Sequence('document_sequence_seq'))
    alpha = (dec / 2106) % 26
    

    Now, dec is no longer an integer, it is a Decimal() which broke the integer division and mod function, etc, and created an infinite loop in my particular case.

    Needless to say, that could be a problem for others as well who might be relying on the Sequence returning an int type. I really hope you don't need to remove this fix... Obviously I can fix my code, but you can't release this as is....any ideas?

    (If cx_oracle returns a Decimal() that is an integer, we can't really cast it as an integer, it would break the other case when a Decimal is expected.)

  7. Former user Account Deleted

    Are there other cases (columns/ expressions) that will behave the same as sequence (namely, that the fix here will change what used to be an int return to Decimal), or was sequence unique in that regard?

  8. Mike Bayer repo owner

    the cases where this breaks down have two aspects. One is that oracle is sending us ambiguous information about the numeric - it has a precision of 0 and a scale of -127. This is what the sequence.nextval(), bizarrely, sends, even though its plainly an integer. The other aspect is that we have no typing information about the expression on the clause side. When those two conditions occur, we put out a Decimal(), which is in fact the most robust and accurate numeric we can, however inappropriate it is in some cases.

    So the sequence case here appears to be an outlier. In virtually all other cases there is typing information associated with a clause construct, and for all trivial cases (i.e. no heavy subqueries) Oracle is sending us meaningful information about the type anyway enough so that we can let the data pass through cx_oracle without the Decimal conversion.

    the awkardness here is that we have to intercept integers at all, usually integers are well separated from floats and such but Oracle seems to mix the two up quite a bit.

  9. Mike Bayer repo owner
    • changed milestone to 0.6.3
    • removed status
    • changed status to open

    so now folks are getting Decimals for ints in the same situation. I want to add special behavior for when we can see that scale is 0, and we really dont know if we're getting ints or floats back. im pretty sure this is the better approach so the commit is coming soon.

  10. Mike Bayer repo owner

    10db67be5b783b94b59dd7cf039f8c322cf837c8 attempts to make this better. The loser here is speed, as we convert any number without a decimal place into an int. This means if you store "45.0" into a numeric with plenty of scale, query it with a subquery that destroys extended type information, we get "45" back, and have no choice but to make it into an int. So the decimal converters now need to check every value. This is preferable to having Decimal come back for plain ints, and we're now behaving more like cx_oracle does in that it converts to "float" based on the presence of a decimal point (as far as I can tell).

    I need to know from "barnett at ploughman-analytics.com" as well as "kb at retailarchitects.com" that this latest approach works.

  11. Former user Account Deleted

    10db67be5b783b94b59dd7cf039f8c322cf837c8 works for both my original code and the changed version that uses the typemap argument to explicitly specify the type though I suppose the latter case is pretty much given.

    For what it's worth, I'd just as soon have the speed back and use the typemap argument though I suppose it's probably a minor difference and keeping ease of use as a high priority is certainly a good thing.

    Thanks much.

    Rodney Barnett

  12. Mike Bayer repo owner

    the speed hit is mostly if you're fetching lots of Decimals, which now incur an extra isinstance() check (though if you look at the constructor for Decimal, they are already calling isinstance() gratuitously). The overhead for ints is slightly less with this approach, in that ints which come back from "ambiguous type" rows don't get converted into a Decimal and back to int. So we're mostly going off of "decimals are expensive anyway, ints should be returned directly".

  13. Former user Account Deleted
    • Are you taking the sequence changes from earlier back out since they are redundant now?
    • Is oracle_cx casting everything from a string from Oracle, and can you tell oracle_cx to just pass the string? Just brainstorming that then you could do the conversion just once, straight to Decimal or int, and gain the lost performance by making cx_oracle return faster?
  14. Mike Bayer repo owner

    Replying to guest:

    • Are you taking the sequence changes from earlier back out since they are redundant now?

    oh, forgot about that. I feel like leaving it in for the time being, its just an int() call.

    • Is oracle_cx casting everything from a string from Oracle, and can you tell oracle_cx to just pass the string? Just brainstorming that then you could do the conversion just once, straight to Decimal or int, and gain the lost performance by making cx_oracle return faster?

    I'm shooting for two things here. One is that you get semi-reasonable behavior when you arent using SQLA types:

    engine.execute("select * from (select x, y, z, ...))")
    

    above, ints should come back as ints, and everything "numeric" and not "float" would come back as Decimal. This is "native decimal" and a DBAPI like psycopg2 does this perfectly. So that's why I wouldn't want to send the strings out.

    The other thing I'm shooting for is that ints are favored in the above scheme in the face of ambiguity. When oracle gives us all zeroes for the type information and the value is "37", we now assume its an int and not a Decimal - that saves us one unnecessary Decimal conversion, and I'd say ints are more prevalent than Decimals, particularly with the sequence behavior.

    The approach that's taken right now isn't doing two Decimal conversions - for a decimal value, its doing a Decimal, then an extra isinstance() (this procedure could also be rolled into the "processors" module so that it would occur in C code). For an int, it goes straight through. Floats come out from cx_oracle as floats in normal circumstances and we are calling float() around it one extra time, using the processor package which can be installed as the C extension.
    If the "ambiguous" situation arises, which only happens in some subquery scenarios, then there's an extra isinstance() happening.

  15. Mike Bayer repo owner

    OK 0.6.3 has to go out, I'm good with how this is for now. See you back for the next go around ... :)

  16. Log in to comment