Converting float to Decimal by default is harmful

Issue #1625 resolved
Former user created an issue

(original reporter: ged) This is probably not new but I just got bitten by it... I think that having "asdecimal" defaults to True in Numeric is both (usually) useless and harmful because if the DBAPI is returning floats, you already lost precision, and converting them to Decimal won't help with the precision, so it does not do any good. Furthermore, it is harmful in two ways: - it will slow down the application quite a bit (decimal.Decimal is pure Python for now so it's at least 10x slower than floats). It's usually worth it for the sake of accuracy, but when you have already lost the accuracy, it only makes sense in very few cases. - more importantly, it hides the lost precision to the user: the user defines a NUMERIC on the DB side and works with Decimal in his Python code, so he thinks he's on the safe side for accuracy? Wrong! He gets the rounding errors anyway behind his back.

Now, there are a few ways to fix this problem: - change the default value to False (with its current meaning) or None (see ticket #1624). - put a BIG FAT warning in the documentation. - add a runtime warning if the DBAPI is not returning Decimals. Would need a way to turn off the warning.

Comments (10)

  1. Mike Bayer repo owner
    • changed milestone to 0.6.0

    psycopg2/pg8000 return Decimal directly - so converting to float would be a performance hit for those, and I'm pretty sure PG's protocol allows that Decimal, which is explicit about every digit, can be used with no loss in precision (whereas using native floating point will cause precision loss). In reality, if your DBAPI is returning Python floats, you're almost definitely information - I've worked with a lot of banking applications with Oracle, MySQL, MSSQL and you never rely on floating points for that stuff, in the Java world BigDecimal is the standard.

    The docs are very clear about Numeric / Float types defaulting to Decimal/float on the python side, with options to change it as needed. It is definitely an end-user job to decide what numeric types on the SQL and Python side are most appropriate for their application - if you don't know much about it and just use defaults, you'll get what you get, both precision and performance-wise, but at least the behavior of the application will be predictable.

    After that is all decided, if there is slowness, the user can then address optimization (which should never be premature), which would include using numeric types that are most optimized for the DBAPI in use.

    Also how is this not a dupe of #1624?

  2. Mike Bayer repo owner

    and by "BigDecimal" in the java world, I mean we define custom Hibernate types that store the values as integers on the DB side - no native FP mechanisms are used.

  3. Former user Account Deleted

    (original author: ged) Replying to zzzeek:

    psycopg2/pg8000 return Decimal directly - so converting to float would be a performance hit for those, and I'm pretty sure PG's protocol allows that Decimal, which is explicit about every digit, can be used with no loss in precision (whereas using native floating point will cause precision loss). In reality, if your DBAPI is returning Python floats, you're almost definitely information - I've worked with a lot of banking applications with Oracle, MySQL, MSSQL and you never rely on floating points for that stuff, in the Java world BigDecimal is the standard.

    Yes, this is all true, and I agree with all that.

    The docs are very clear about Numeric / Float types defaulting to Decimal/float on the python side, with options to change it as needed.

    Indeed.

    It is definitely an end-user job to decide what numeric types on the SQL and Python side are most appropriate for their application

    Of course.

    • if you don't know much about it and just use defaults

    This is where I think there is a grave problem.

    , you'll get what you get, both precision and performance-wise, but at least the behavior of the application will be predictable.

    Well, predictability is a good point, that doesn't mean we shouldn't try to better warn our users about the flaws of the DBAPI that we are hiding from him (this is the important point here).

    After that is all decided, if there is slowness, the user can then address optimization (which should never be premature), which would include using numeric types that are most optimized for the DBAPI in use.

    Also how is this not a dupe of #1624?

    1624 addresses the documentation issue and discrepancy between dialects. This one addresses the harmfulness of hiding the loss of precision without even a warning.

  4. Mike Bayer repo owner

    so....is this ticket strictly a documentation one ? more docs are always a good thing. I just wouldn't want to phrase it such that the end user is doing something "wrong" by using floats/decimals, more like, here are the pros/cons of each please look before you leap.

  5. Former user Account Deleted

    (original author: ged) Replying to zzzeek:

    so....is this ticket strictly a documentation one ? more docs are always a good thing. I just wouldn't want to phrase it such that the end user is doing something "wrong" by using floats/decimals, more like, here are the pros/cons of each please look before you leap.

    Documentation or runtime warning for the "Numeric through float" case or both documentation and a warning. Personally I think the best option is both (with a way to disable the warning), because this is a sneaky problem which you might not see immediately, put your program in production and it will blow in your face after a few weeks or even months. As usual, this is your decision and I'll respect it.

  6. Mike Bayer repo owner

    Why would it blow in your face ? just slightly less than optimal performance ? I don't see the Decimal issue as a huge performance hog. A loss in floating point precision ? As I've said, being serious about FP precision means taking proactive steps. A casual user doesn't who doesn't want to read the docs only needs "good enough" FP behavior and performance - a warning should only be for real "no really, your app is definitely going to suck like this" situations, and I don't see this as one of those.

  7. Mike Bayer repo owner

    Its entirely reasonable for there to be a new documentation section / FAQ area dedicated to performance issues. When a user observes performance being less than what they'd like, every one of these hints can be compiled there - check what kind of Decimals/floats you're dealing with, check your sqlite regexes, check if your C extensions are compiled, check the unicode behavior of your DBAPI, check about using PG server side cursors (more options coming for that), check your joins, etc etc. It can all go there, every gotcha there is. SQLA 0.6 runs pretty well out of the box and for moderate datasizes there's no issue. Scaling up anything requires tuning.

  8. Former user Account Deleted

    (original author: ged) Replying to zzzeek:

    Why would it blow in your face ? just slightly less than optimal performance ?

    Of course not. The speed is only a minor issue, which is only relevant because the conversion is not useful in those cases. And "will blow up" because in most case, when you use Numeric fields instead of Float, that's because you want (and need) the extra precision (monetary amount, ...) and even a minor difference can be unacceptable and be considered like a blow-up.

    A loss in floating point precision ? As I've said, being serious about FP precision means taking proactive steps.

    Yes. And using Numeric and Decimal are the proactive steps.

  9. Mike Bayer repo owner

    After rereading the original description, I don't buy point #1 because the point of returning Decimal is for consistency across backends, and I don't buy point #2 because the fact that the DBAPI returns floats and not decimals is a database/DBAPI issue - I don't see users "assuming" that just because its Decimal, its "accurate" based on the DB value. I guess you did though. It is the case that because they have a Decimal, further math operations will be accurate relative to the starting value - so even then, if you're using a DBAPI/database that has rounding errors, returning the Decimal is still better than the float except for performance.

    if we add a recipe or such for an IntegerDecimal type which returns Decimals, but stores the value as long integers, developers will see that and realize its a solution to producing accurate FP values on their target database.

  10. Log in to comment