float-induced rounding errors when parsing a datetime string with microsecond precision
Example, from v5.1.0
assert aniso8601.parse_datetime("2019-06-05T01:03:11.858714") == datetime(2019, 6, 5, 1, 3, 11, 858714)
Analysis
You seem to have propagated float-based rounding errors by adding together the result of multiple float
casts, and then truncated the result rather than rounding (for reasons that you discussed in your blog, but which seem irrelevant for this use case).
For this example, the problem comes from this code in PythonTimeBuilder.build_time
:
seconds, floatseconds = cls._split_and_cast(ss, 'Invalid second string.') # seconds is an int
...
totalseconds = float(seconds) + floatseconds
Proposal
You seem to always deal with the fractional component of a floating point number, which you derive by something like float(number_as_string.split(".")[1])
. What if you always represent that fractional part as an integer?
Alternatively, have you considered using decimal
's instead?
Comments (9)
-
repo owner -
reporter Thanks Brandon, my apologies - I had thought this might be an example you were unaware of. I feel my use case is a lot more simple than
aniso8601
is designed for, perhaps.Cheers,
gary
-
repo owner I was unaware of this specific example, but I’m aware that fixing the really bad degenerate cases introduced some regressions in the handling of simpler cases like yours. At the time, +/- a microsecond didn’t seem so bad. I’m incredibly thankful for all of these failed parses people send me, they help prevent issues in the future!
This is absolutely a bug, microsecond precision should be lossless, and I think I’ve got a quick fix that will work, but only for fractional seconds with microsecond or lower resolution. Fractional everything else will still be subject to issues with floating point precision for the time being. Hopefully I’ll have a new version pushed by Monday.
-
repo owner - changed status to resolved
Ensure times and durations with fractional seconds and microsecond or lower resolution are not subject to rounding.
With the release of 6.0.0, the fractional components of seconds are cast to floats before being truncated. These fractional components can be cast directly to microseconds by left justifying with 0s, and truncating to 6 characters before casting to an int. This prevents roundoff issues, such as those mentioned in
#24.This does not fix rounding issues with fractional years, months, weeks, days, or minutes.
This fixes
#24.→ <<cset b4aca8ffc255>>
-
repo owner The issue-24 branch contains a workaround, preventing casts of fractional seconds to floats. I’m going to look at some other possibly better fixes, but something fixing this problem should be released as 6.1.0 on Monday.
-
repo owner The all-integer branch handles all fractional components (year, month, weeks, days, minutes, seconds) as integer, no floating point operations are performed. Since all internal representations are exact, all operations with microsecond or lower resolution should effectively be lossless, any times with greater than microsecond resolution will have the additional resolution discarded.
It will likely launch as 7.0.0 in the next couple days.
-
repo owner 7.0.0 has been released, RelativeTimeBuilder has been updated as well.
-
repo owner -
reporter Great, thanks so much @Brandon Nielsen for your hard work (and for the shout out on your blog). I’m glad the solution came out easily, and that it was fun to code!
- Log in to comment
You are of course, absolutely correct, but as noted in the blog post, the current implementation was chosen because it works in a variety of cases (and fixes some bad degenerate cases), not because it works in all cases. Also as mentioned in the blog post, if you need actual microsecond precision that’s correct in all cases, there’s the AttoTimeBuilder, from there you could cast back to a Python
datetime
, rounding or truncating as desired.
I had considered representing the fractional part as an integer, but it’s a harder nut to crack than it appears on the surface, and I had people waiting for a solution sooner rather than later. Seconds and microseconds are easy, but what about years, months, weeks, days, hours, and minutes? What does the fractional component of ‘1.4Y' become? Keep in mind the resulting days, hours, minutes, seconds, and microseconds, need to be ranged for the datetime constructor. Almost certainly doable, I just haven’t done it. I would welcome an implementation, or pseudocode, with open arms.
Solving it using
decimal
would be "easy", but I have more users that have worked with me that are using this library as a fully featured parser with (relatively) high performance than I do users that have issues with the rounding / truncation behavior. A person could certainly construct aDecimalTimeBuilder
that useddecimal
internally, I would likely even welcome it into this project, in fact I have considered writing one myself, I just haven't had the time or the need. Alternatively, someone could convince me that using adecimal
internally in thePythonTimeBuilder
wouldn’t reduce performance, but my own experiments have not shown that to be true.