- changed milestone to 0.6.xx
Oracle joinedload with limit breaks Decimal() cast
The following script (also attached) demonstrates the cast to Decimal() instead of float is broken with a joinedload using a limit for Oracle:
from sqlalchemy import *
from sqlalchemy.orm import *
#engine = create_engine('postgresql://un:pw@localhost:5444/salespylot',echo=True)
engine = create_engine('oracle://un:pw@localhost:1521/xe?use_ansi=False',echo=True)
metadata = MetaData()
Session = sessionmaker(bind=engine)
session = Session()
orders_table = Table("orders", metadata,
Column("orderid", Unicode(255), primary_key=True)
)
orderdetails_table = Table("orderdetails",metadata,
Column("orderid", Unicode(255), ForeignKey('orders.orderid'), primary_key=True),
Column("lineid", Integer, primary_key=True),
Column("qtyordered", Numeric(20,2), nullable=False),
Column("saleprice", Numeric(20,2), nullable=False),
)
class Order(object):
pass
class OrderDetail(object):
pass
order_mapper = mapper(Order, orders_table,
properties={'orderdetails': relation(OrderDetail,
cascade='all,delete-orphan',
single_parent=True,
lazy=False,
backref=backref('parentorder',
cascade='refresh-expire,expunge'))
})
od_alias=orderdetails_table.alias('od__a')
order_mapper.add_property('totalsale',
# totalsale is an inline view column
column_property(cast(
select([* od_alias.c.saleprice)](func.sum(od_alias.c.qtyordered),
orders_table.c.orderid==od_alias.c.orderid
).as_scalar(), Numeric(20,2)).label('totalsale')))
mapper(OrderDetail, orderdetails_table, allow_partial_pks=False)
print "\n\n%r\n\n" % session.query(Order).all()[0](0).totalsale
print "\n\n%r" % session.query(Order).limit(100).all()[0](0).totalsale
Creates this output:
2010-07-01 14:03:23,667 INFO sqlalchemy.engine.base.Engine.0x...43d0 SELECT USER FROM DUAL
2010-07-01 14:03:23,674 INFO sqlalchemy.engine.base.Engine.0x...43d0 {}
2010-07-01 14:03:23,678 INFO sqlalchemy.engine.base.Engine.0x...43d0 BEGIN
2010-07-01 14:03:23,679 INFO sqlalchemy.engine.base.Engine.0x...43d0 SELECT orders.orderid AS orders_orderid, CAST((SELECT sum(od__a.qtyordered * od__a.saleprice) AS sum_1
FROM orderdetails od__a
WHERE orders.orderid = od__a.orderid) AS NUMERIC(20, 2)) AS totalsale, orderdetails_1.orderid AS orderdetails_1_orderid, orderdetails_1.lineid AS orderdetails_1_lineid, orderdetails_1.qtyordered AS orderdetails_1_qtyordered, orderdetails_1.saleprice AS orderdetails_1_saleprice
FROM orders LEFT OUTER JOIN orderdetails orderdetails_1 ON orders.orderid = orderdetails_1.orderid
2010-07-01 14:03:23,679 INFO sqlalchemy.engine.base.Engine.0x...43d0 {}
Decimal('2499.15')
2010-07-01 14:03:23,705 INFO sqlalchemy.engine.base.Engine.0x...43d0 SELECT anon_1.orders_orderid AS anon_1_orders_orderid, anon_1.totalsale AS anon_1_totalsale, orderdetails_1.orderid AS orderdetails_1_orderid, orderdetails_1.lineid AS orderdetails_1_lineid, orderdetails_1.qtyordered AS orderdetails_1_qtyordered, orderdetails_1.saleprice AS orderdetails_1_saleprice
FROM (SELECT orders_orderid, totalsale
FROM (SELECT orders.orderid AS orders_orderid, CAST((SELECT sum(od__a.qtyordered * od__a.saleprice) AS sum_1
FROM orderdetails od__a
WHERE orders.orderid = od__a.orderid) AS NUMERIC(20, 2)) AS totalsale
FROM orders)
WHERE ROWNUM <= :ROWNUM_1) anon_1 LEFT OUTER JOIN orderdetails orderdetails_1 ON anon_1.orders_orderid = orderdetails_1.orderid
2010-07-01 14:03:23,705 INFO sqlalchemy.engine.base.Engine.0x...43d0 {'ROWNUM_1': 100}
2499.1500000000001
(Broken regardless of use_ansi
setting.)
Works fine:
* With postgres
* With lazy=True
instead of lazy=False
for the joinedload
* Without limit (as demonstrated)
Comments (20)
-
repo owner -
Account Deleted Turning off
supports_native_decimal
means sqlalchemy will figure out how to cast the results from floats to decimal?The warning I get is
SAWarning: Dialect oracle+cx_oracle does *not* support Decimal objects natively, and SQLAlchemy must convert from floating point - rounding errors and other issues may occur. Please consider storing Decimal numbers as strings or integers on this platform for lossless storage.
My question is: Is this Warning accurate in this case? Is sqlalchemy really converting from floating point if
supports_native_decimal
is off, even if oracle is returning 2499.15? In other words, if I turn offsupports_native_decimal
does cx_Oracle return a float to sqlalchemy? (as opposed to a string, etc) -
repo owner - changed milestone to 0.6.2
OK, so two things.
First of all, if you turn off supports_native_decimal, then yes, cx_oracle is returning a Python floating point number, like
15.756300000000001
, which we convert to decimal using a "%f" string template. This process is known to be lossy for numbers with a large amount of significant digits.However, the good news is that, if I just change our oracle "output type handler", the thing that disables cx_oracle from sending us floats, to be more liberal about what we convert to Decimal(), that works absolutely fine as well - as the Float type in SQLAlchemy seems to be passing floats through our "as_float" processor unconditionally. If the SQLAlchemy type really wants a float, the conversion is oracle string->cx_oracle->Decimal->float(Decimal), which is accurate. It just needs to let integers through without being converted.
-
Account Deleted Very nice. (It would also work, when oracle, to put
CAST ... AS NUMERIC(20,2)
around the outer-most SELECT columns, whose type SQLAlchemy already knows. But your solution sounds like it is obviously the way to go.) Thanks -
repo owner yes, if we copied out the CAST to the top level that would work too, but that would be exceedingly difficult to wrangle into the expression system, so I'm glad this other idea looks promising. It is in 25db56bc0c1ee30fc0ad9166ac48de7dc8328762 and please let me know if it solves your issue.
-
Account Deleted Thank you, this patch seems to fix the problem.
-
repo owner - changed status to resolved
-
Account Deleted Our fix here caused an infinite loop in an unrelated portion of my code. I was expecting the return of a Sequence to give me an integer. In fact, I was doing integer math with the return:
dec = DBSession.execute(Sequence('document_sequence_seq')) alpha = (dec / 2106) % 26
Now,
dec
is no longer an integer, it is a Decimal() which broke the integer division and mod function, etc, and created an infinite loop in my particular case.Needless to say, that could be a problem for others as well who might be relying on the Sequence returning an int type. I really hope you don't need to remove this fix... Obviously I can fix my code, but you can't release this as is....any ideas?
(If cx_oracle returns a Decimal() that is an integer, we can't really cast it as an integer, it would break the other case when a Decimal is expected.)
-
repo owner -
Account Deleted Are there other cases (columns/ expressions) that will behave the same as sequence (namely, that the fix here will change what used to be an int return to Decimal), or was sequence unique in that regard?
-
repo owner the cases where this breaks down have two aspects. One is that oracle is sending us ambiguous information about the numeric - it has a precision of 0 and a scale of -127. This is what the sequence.nextval(), bizarrely, sends, even though its plainly an integer. The other aspect is that we have no typing information about the expression on the clause side. When those two conditions occur, we put out a Decimal(), which is in fact the most robust and accurate numeric we can, however inappropriate it is in some cases.
So the sequence case here appears to be an outlier. In virtually all other cases there is typing information associated with a clause construct, and for all trivial cases (i.e. no heavy subqueries) Oracle is sending us meaningful information about the type anyway enough so that we can let the data pass through cx_oracle without the Decimal conversion.
the awkardness here is that we have to intercept integers at all, usually integers are well separated from floats and such but Oracle seems to mix the two up quite a bit.
-
repo owner so now folks are getting Decimals for ints in the same situation. I want to add special behavior for when we can see that scale is 0, and we really dont know if we're getting ints or floats back. im pretty sure this is the better approach so the commit is coming soon.
-
repo owner 10db67be5b783b94b59dd7cf039f8c322cf837c8 attempts to make this better. The loser here is speed, as we convert any number without a decimal place into an int. This means if you store "45.0" into a numeric with plenty of scale, query it with a subquery that destroys extended type information, we get "45" back, and have no choice but to make it into an int. So the decimal converters now need to check every value. This is preferable to having
Decimal
come back for plain ints, and we're now behaving more like cx_oracle does in that it converts to "float" based on the presence of a decimal point (as far as I can tell).I need to know from "barnett at ploughman-analytics.com" as well as "kb at retailarchitects.com" that this latest approach works.
-
Account Deleted 10db67be5b783b94b59dd7cf039f8c322cf837c8 works for both my original code and the changed version that uses the typemap argument to explicitly specify the type though I suppose the latter case is pretty much given.
For what it's worth, I'd just as soon have the speed back and use the typemap argument though I suppose it's probably a minor difference and keeping ease of use as a high priority is certainly a good thing.
Thanks much.
Rodney Barnett
-
repo owner the speed hit is mostly if you're fetching lots of Decimals, which now incur an extra isinstance() check (though if you look at the constructor for
Decimal
, they are already callingisinstance()
gratuitously). The overhead for ints is slightly less with this approach, in that ints which come back from "ambiguous type" rows don't get converted into a Decimal and back to int. So we're mostly going off of "decimals are expensive anyway, ints should be returned directly". -
Account Deleted Give me a little time to get to this... Thanks, Kent
-
Account Deleted - Are you taking the sequence changes from earlier back out since they are redundant now?
- Is oracle_cx casting everything from a string from Oracle, and can you tell oracle_cx to just pass the string? Just brainstorming that then you could do the conversion just once, straight to Decimal or int, and gain the lost performance by making cx_oracle return faster?
-
repo owner Replying to guest:
- Are you taking the sequence changes from earlier back out since they are redundant now?
oh, forgot about that. I feel like leaving it in for the time being, its just an int() call.
- Is oracle_cx casting everything from a string from Oracle, and can you tell oracle_cx to just pass the string? Just brainstorming that then you could do the conversion just once, straight to Decimal or int, and gain the lost performance by making cx_oracle return faster?
I'm shooting for two things here. One is that you get semi-reasonable behavior when you arent using SQLA types:
engine.execute("select * from (select x, y, z, ...))")
above, ints should come back as ints, and everything "numeric" and not "float" would come back as Decimal. This is "native decimal" and a DBAPI like psycopg2 does this perfectly. So that's why I wouldn't want to send the strings out.
The other thing I'm shooting for is that ints are favored in the above scheme in the face of ambiguity. When oracle gives us all zeroes for the type information and the value is "37", we now assume its an int and not a Decimal - that saves us one unnecessary Decimal conversion, and I'd say ints are more prevalent than Decimals, particularly with the sequence behavior.
The approach that's taken right now isn't doing two Decimal conversions - for a decimal value, its doing a Decimal, then an extra isinstance() (this procedure could also be rolled into the "processors" module so that it would occur in C code). For an int, it goes straight through. Floats come out from cx_oracle as floats in normal circumstances and we are calling float() around it one extra time, using the processor package which can be installed as the C extension.
If the "ambiguous" situation arises, which only happens in some subquery scenarios, then there's an extra isinstance() happening. -
repo owner - changed status to resolved
OK 0.6.3 has to go out, I'm good with how this is for now. See you back for the next go around ... :)
-
repo owner - removed milestone
Removing milestone: 0.6.3 (automated comment)
- Log in to comment
This is an Oracle issue. We can't fix it, but there are workarounds.
Here is the Oracle bug:
AS you can see, the type information is lost in the second query.
Two workarounds:
turn off native decimal.
#!python
engine = create_engine('oracle://scott:tiger@localhost/xe',echo=True) engine.connect() engine.dialect.supports_native_decimal = False
Don't use joinedload, use subqueryload.
#!python
order_mapper = mapper(Order, orders_table, properties={'orderdetails': relation(OrderDetail, cascade='all,delete-orphan', single_parent=True, lazy='subquery', backref=backref('parentorder', cascade='refresh-expire,expunge')) })