Sub-microsecond precision in durations is lost

Issue #10 resolved
Matti Niemenmaa created an issue

datetime.timedelta has only microsecond precision so anything more precise is lost:

>>> aniso8601.parse_duration('PT0.0000001S')
datetime.timedelta(0)

>>> aniso8601.parse_duration('PT2.0000048S')
datetime.timedelta(0, 2, 5)

This may be a reasonable default but sometimes it'd be nice to be able to catch these cases. I'm not sure what the ideal interface would be. Perhaps an optional parameter by which the caller gets back both the most precise timedelta possible and a duration string that specifies the remainder? This would have to be extended to intervals as well so I can imagine it getting a bit nasty.

This is with aniso8601 1.1.0.

Comments (12)

  1. Brandon Nielsen repo owner

    This gets really ugly really fast. Python time objects don't support sub-microsecond precision either, and this whole library revolves around the Python date, time, datetime, and timedelta objects. That being said, you're right, blindly throwing precision away isn't the right answer either.

    Do you know, is there some 'high precision' implementation of the datetime family that supports arbitrary precision? I'd be open to adding some kind of 'high precision' option returning a higher precision object, and changing the default implementation to raising an exception when precision is being thrown away.

    At the very least, throwing away precision should probably be an exception.

  2. Matti Niemenmaa reporter

    The closest thing I can think of is numpy's datetime64 and timedelta64 which go up to attosecond precision, but as they are 64-bit integers underneath going above nanosecond precision drastically reduces the representable range: https://docs.scipy.org/doc/numpy-dev/reference/arrays.datetime.html#datetime-units . So while it could improve precision, it wouldn't be sufficient. And regardless, it doesn't seem right for aniso8601 to depend on numpy just for that.

    Apparently the rejected PEP 410 suggested a high-precision timestamp type. I disagree with the resolution but oh well. Further Googling turned up https://github.com/flyingfrog81/accuratetime but it's quite dead and uses a plain float internally anyway.

    Using an additional decimal.Decimal for any sub-microsecond data seems reasonable to me (better than the string I suggested originally) but unless you feel like implementing your own wrappers around the datetime family this'd make the API rather messy.

    Throwing an exception is fine but without support for the full precision it should at least be possible to get the rounded result somehow, otherwise a caller that wants to ignore sub-microsecond precision has to manually mangle the string in order to remove excess digits.

  3. Brandon Nielsen repo owner

    Thanks for finding those.

    I think you're right, the best solution would be wrappers around time and timedelta to handle the additional sub-microsecond precision. Not sure yet, those may be implemented in a separate library.

    I wouldn't change to throw an exception until we have a way to handle the additional resolution, but I think throwing an exception when a parse would lose resolution is the correct course of action. The standard calls out "The interchange parties, dependent upon the application, shall agree the number of digits in the decimal fraction. ", so I'd say our half of the interchange isn't agreeing on the number of digits if we're effectively truncating, and silently losing precision is a bad thing.

  4. Ryan Senkbeil

    If there turns out to be a nice third party library that handles this, the behavior could be enabled using the setup tools "extras" feature.

    I do think though that it should throw an exception instead of blindly dropping the extra precision.

  5. Brandon Nielsen repo owner

    From an e-mail discussion:

    I just wanted to let you know that the issue is more severe than it looks. It's not just losing precision, it's outright failing to parse some timestamps that are too close to 60:

    >>> from aniso8601 import parse_time
    >>> parse_time("14:43:59.9999997")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/adys/.local/share/virtualenvs/hs/lib/python3.6/site-packages/aniso8601/time.py",
    line 107, in parse_time
        return _parse_time_naive(timestr)
      File "/home/adys/.local/share/virtualenvs/hs/lib/python3.6/site-packages/aniso8601/time.py",
    line 136, in _parse_time_naive
        return _RESOLUTION_MAP[get_time_resolution(timestr)](timestr)
      File "/home/adys/.local/share/virtualenvs/hs/lib/python3.6/site-packages/aniso8601/time.py",
    line 204, in _parse_second_time
        raise ValueError('Seconds must be less than 60.')
    ValueError: Seconds must be less than 60.
    

    Additionally I don't think that exception is correct. Positive leap seconds are inserted as :60. (https://en.wikipedia.org/wiki/Leap_second)

  6. Brandon Nielsen repo owner

    From an e-mail discussion:

    Ok, so I've been thinking about this. Here is what I suggest:

    1. The precision is lost and this has to be just a known limitation. But then, it needs to be truncated rather than rounded. This fixes the error I am having.
    2. Leap seconds will have to be a known limitation as well. If the second is exactly 60, and the rest of the time is 23 59, raise a LeapSecondError, subclass of NotImplementedError.
    3. In all other cases, keep doing what you do now.

    WDYT? It doesn't fix everything, but at least it's a net improvement over what we have now.

  7. Brandon Nielsen repo owner

    I agree, 14:43:59.9999997 gives a good example of how wrong the current behavior is. It should be truncated to microsecond precision before being parsed into a timedelta, see the following test case:

    time = _parse_second_time('14:43:59.9999997')
    self.assertEqual(time, datetime.time(hour=14, minute=43, second=59, microsecond=999999))
    

    Additionally, it should be made more explicit that leap seconds are not supported. A subclass of NotImplementedError is fine.

    Longer term, the goal is still to decouple from the base classes so both this, and #13 can be handled better.

  8. Brandon Nielsen repo owner

    As of 3.0.0, released today, times are truncated to microsecond precision instead of rounding to nearest. More specific exceptions are used all around, including LeapSecondError when parsing a time with a seconds value of 60 (values larger than 60 raise a SecondsOutOfBoundsError).

  9. Brandon Nielsen repo owner

    Over 3 years later, the 'parser' has been decoupled from the base Python date, datetime, and timedelta classes. Instead, the parser calls a builder, which is responsible for taking the parse results and building a correct output. The default builder is the PythonTimeBuilder, which uses the built in date, datetime, and timedelta classes.

    When the relative keyword is used, the RelativeTimeBuilder is used instead. The relative keyword will be deprecated with 4.0.0, and removed in 5.0.0, with the RelativeTimeBuilder moving to a separate project.

    I may implement a builder for the SciPy datetime64 family. Alternatively, I may implement the wrapper discussed above, along with an appropriate builder. These additional builders would be separate packages.

    This design has advantages and disadvantages. The big advantages are making it possible to support things like sub-microsecond precision, leap seconds, as well as enabling more specific exceptions. The big downside is things like bounds checking are now implemented in the builder, and will be largely duplicated across all implemented builders.

    For now, these changes live in the isobuilder branch, once the documentation is updated, they will be rolled out as 4.0.0.

  10. Brandon Nielsen repo owner

    I've wrote a decently functional builder for NumPy datetime64: numpytimebuilder.

    Given the limitations of the datetime64 and timedelta64 implementations, it's probably still worth the effort to write a datetime implementation that uses Decimal internally for additional resolution.

  11. Log in to comment