Wrong date (1000 years before) read by openpyxl

Issue #1184 closed
wangxa created an issue

Hi,

If an XLSX cell formatted as a Date and its value is '27/07/1016' (UK date style), the value read by openpyxl is '02/08/1016'. Why 6 days are added to the original value?

I have tried several dates, for example:

03/10/1582 will be read as 13/10/1582 by openpyxl and 06/10/1582 will be 06/10/1582.

But unfortunately, they are wrong. 03/10/1582 should still be 03/10/1582. There is no date from 05/10/1582 to 14/10/1582 in history, due to the Gregorian calendar replacing the Julian calendar in that year.

The dates after '15/10/1582' are correctly processed.

It seems the datetime module of Python has some facilities to calculate the missing dates between the different calendars, although I think the algorithm itself is problematic.

But in openpyxl, the original cell values should be kept instead of changing them.

Thank you.

Comments (4)

  1. CharlieC

    Excel simply cannot be relied upon to handle dates before 1st January 1900 reliably.

    As for the calendar: there is a difference between the calendar used (Gregorian or Julian) and when, if ever, a switch between them was made in a particular country or locale.

    According to the specification: the lower limit is January 1st, 0001 00:00:00, which has a serial date-time of -693593.

    Internally openpyxl relies on jdcal for manipulating dates based on Excel's epochs of either 1899-12-30 (standard) or 1904-01-01 (MacOS legacy).

    from openpyxl.utils.datetime import from_excel
    from_excel(-693593)
    datetime.datetime(1, 1, 1, 0, 0)
    

    Which is correct according to the OOXML specification.

  2. wangxa reporter

    I found where the problem is:

    in openpyxl.utils.datetime.from_excel(), the line

    parts = list(jd2gcal(MJD_0, value + offset - MJD_0))
    

    is used to convert the original value in Excel (an integer) to the Date elements like year, month, day and seconds. However, if value argument is less than -115858, then the result of value + offset is the Julian day for a date in Julian calendar, so the corresponding conversion function is jd2jcal not jd2gcal. The value for those are equal to or greater than -115858 is the Julian day for a date in Gregorian calendar.

    -115858 is the value for 1582-10-15 (The first day of the Gregorian calendar), which is stored the Excel XML files.

    If you set value as -115858 and call jd2gcal, the value of parts is:

    >>> list (jd2gcal(MJD_0, -115858 + CALENDAR_WINDOWS_1900 - MJD_0))
    [1582, 10, 15, 0.0]
    

    If value is set as -115859 and jd2jcal is called, the value of parts is:

    >>> list (jd2jcal(MJD_0, -115859 + CALENDAR_WINDOWS_1900 - MJD_0))
    [1582, 10, 4, 0.0]
    

    This is correct, because there is no date from 1582-10-05 to 1582-10-14 in the Gregorian calendar.

  3. CharlieC

    You'll have to write your own logic for this, because the specification assumes Gregorian for everything. Basically, as I said above, the file format isn't at all suitable for historical dates.

  4. Log in to comment