Saving a workbook with datetime.datetime(1, 1, 1) will raise ValueError

Issue #558 wontfix
Tiago Guimarães created an issue

Today I tried to save a worksheet with the date 01/01/0001, represented by the object datetime.datetime(1, 1, 1), and openpyxl raised the following error:

from openpyxl import Workbook
import datetime

wb = Workbook()
wb.active['A1'] = datetime.datetime(1,1,1)
wb.save('test.xlsx')
>>> ValueError: year is out of range

This error is raised inside the from_excel function at openpyxl.utils.datetime, by the datetime.datetime constructor.

It seems that this constructor uses a slice of the parts variable, representing the year, month and date of a given value.

The problem is (or at least I think it is) that when processing the date 01/01/0001, that triplet has the values [0, 12, 31], thus, raising the error for a year out of range.

Comments (15)

  1. CharlieC

    Actually, Excel cannot work with any date before 1899-12-31 so anything before this date is undefined.

  2. Tiago Guimarães reporter

    That's interesting.
    So, a default value for those cases should be considered? Like using None or some other representation for "undefined" ?

  3. CharlieC

    I don't see that as within the scope of the library. The fact is that if you use date times before the epoch that you cannot be sure what any application reading them will make of them.

  4. CharlieC

    Within the scope of the library, the behaviour in Excel has to be considered. As this persists an old bug in Lotus 1-2-3 (1900 was a leap year) it means that behaviour before the epoch is undefined and thus unreliable. Excel is quite happy with 1900-Jan-00 for the epoch. Python's own datetime library is limited to 0001-01-01 and this will conflict with the correction for dates before 1900-03-01. Exposing the exception in the underlying library for 0001-01-01 is reasonable.

  5. Václav Krpec

    Hi, I happen to get this exception a lot; it seems that one can't safely assume that Excel keeps only valid date values in its Date-type cells.
    When I process large XLSX files (unfortunately still used as a kind of DB within the company), it's simply impossible to correct these in the XLSX file; I ended up just patching openpyxl.utils.datetime.from_excel function.

    Wouldn't you consider re-opening the issue?

  6. CharlieC

    Sorry, can't do anything about this. According to the specification, the lower boundary for dates is

    In the 1900 date system, the lower limit is January 1st, 0001 00:00:00, which has a serial date-time of -693593.
    

    But A2 in your sheet has a value of -693594 which is below this and thus outside the specification. More importantly, the offset that we need to apply to dates before 1900-03-01 must still be applied here, so that date is also outside Python's range. I can't think of a sane way of resolving this because it is effectively undetermined.

    In any case: Excel has real problems with dates before the epoch. If you switch to using the 1904 epoch, dates will at least render but with totally weird dates. I've submitted a bug to the OOXML committee and Microsoft about this.

  7. Václav Krpec

    Yup, I know the data is just faulty; I don't expect you to magically fix that. :-)
    What I was thinking about was e.g. an (optional) switch (like strict=True) for e.g. openpyxl.load_workbook.
    This way, the functionality would be fully backwards-compatible (we'd get the exception by default).
    But with the strict mode suppressed, the from_excel function would return e.g. None for invalid dates (or maybe some fault sentinel or just datetime.datetime() or something).

    As noted above, if you're forced to process a big file with such a mess inside, you simply end up doing exactly that.

    I mean, it's good that you've submitted the bug-report, but frankly, how soon could we expect Microsoft to fix that? In my own experience, trying to force Microsoft to correct bugs in its products takes for ages (if they bother at all).

  8. CharlieC

    Ugh! I really don't see it as the library's responsibility to provide this kind of workaround which could mean all kinds of different things all over the place and also add additional checks to a bottleneck.

    You're better of monkey patching the conversion function.

    What might work is using Numpy's datetime implmentation, which doesn't have the same lower bound AFAIK and could be used transparently, if installed.

  9. Log in to comment