float-induced rounding errors when parsing a datetime string with microsecond precision

Issue #24 resolved
Gary Donovan created an issue

Example, from v5.1.0

assert aniso8601.parse_datetime("2019-06-05T01:03:11.858714") == datetime(2019, 6, 5, 1, 3, 11, 858714)

Analysis

You seem to have propagated float-based rounding errors by adding together the result of multiple float casts, and then truncated the result rather than rounding (for reasons that you discussed in your blog, but which seem irrelevant for this use case).

For this example, the problem comes from this code in PythonTimeBuilder.build_time:

seconds, floatseconds = cls._split_and_cast(ss, 'Invalid second string.') # seconds is an int
...
totalseconds = float(seconds) + floatseconds

Proposal

You seem to always deal with the fractional component of a floating point number, which you derive by something like float(number_as_string.split(".")[1]). What if you always represent that fractional part as an integer?

Alternatively, have you considered using decimal's instead?

Comments (9)

  1. Brandon Nielsen repo owner

    You are of course, absolutely correct, but as noted in the blog post, the current implementation was chosen because it works in a variety of cases (and fixes some bad degenerate cases), not because it works in all cases. Also as mentioned in the blog post, if you need actual microsecond precision that’s correct in all cases, there’s the AttoTimeBuilder, from there you could cast back to a Python datetime, rounding or truncating as desired.

    I had considered representing the fractional part as an integer, but it’s a harder nut to crack than it appears on the surface, and I had people waiting for a solution sooner rather than later. Seconds and microseconds are easy, but what about years, months, weeks, days, hours, and minutes? What does the fractional component of ‘1.4Y' become? Keep in mind the resulting days, hours, minutes, seconds, and microseconds, need to be ranged for the datetime constructor. Almost certainly doable, I just haven’t done it. I would welcome an implementation, or pseudocode, with open arms.

    Solving it using decimal would be "easy", but I have more users that have worked with me that are using this library as a fully featured parser with (relatively) high performance than I do users that have issues with the rounding / truncation behavior. A person could certainly construct a DecimalTimeBuilder that used decimal internally, I would likely even welcome it into this project, in fact I have considered writing one myself, I just haven't had the time or the need. Alternatively, someone could convince me that using a decimalinternally in the PythonTimeBuilder wouldn’t reduce performance, but my own experiments have not shown that to be true.

  2. Gary Donovan reporter

    Thanks Brandon, my apologies - I had thought this might be an example you were unaware of. I feel my use case is a lot more simple than aniso8601 is designed for, perhaps.

    Cheers,

    gary

  3. Brandon Nielsen repo owner

    I was unaware of this specific example, but I’m aware that fixing the really bad degenerate cases introduced some regressions in the handling of simpler cases like yours. At the time, +/- a microsecond didn’t seem so bad. I’m incredibly thankful for all of these failed parses people send me, they help prevent issues in the future!

    This is absolutely a bug, microsecond precision should be lossless, and I think I’ve got a quick fix that will work, but only for fractional seconds with microsecond or lower resolution. Fractional everything else will still be subject to issues with floating point precision for the time being. Hopefully I’ll have a new version pushed by Monday.

  4. Brandon Nielsen repo owner

    Ensure times and durations with fractional seconds and microsecond or lower resolution are not subject to rounding.

    With the release of 6.0.0, the fractional components of seconds are cast to floats before being truncated. These fractional components can be cast directly to microseconds by left justifying with 0s, and truncating to 6 characters before casting to an int. This prevents roundoff issues, such as those mentioned in #24.

    This does not fix rounding issues with fractional years, months, weeks, days, or minutes.

    This fixes #24.

    → <<cset b4aca8ffc255>>

  5. Brandon Nielsen repo owner

    The issue-24 branch contains a workaround, preventing casts of fractional seconds to floats. I’m going to look at some other possibly better fixes, but something fixing this problem should be released as 6.1.0 on Monday.

  6. Brandon Nielsen repo owner

    The all-integer branch handles all fractional components (year, month, weeks, days, minutes, seconds) as integer, no floating point operations are performed. Since all internal representations are exact, all operations with microsecond or lower resolution should effectively be lossless, any times with greater than microsecond resolution will have the additional resolution discarded.

    It will likely launch as 7.0.0 in the next couple days.

  7. Gary Donovan reporter

    Great, thanks so much @Brandon Nielsen for your hard work (and for the shout out on your blog). I’m glad the solution came out easily, and that it was fun to code!

  8. Log in to comment