Error on cells incorrectly formatted as date

Issue #897 wontfix
Jan Simbera created an issue

The from_excel routine fails with ValueError: year XY is out of range if the cell contains a date outside acceptable boundaries.

This makes the whole parsing of the sheet fail on cells that contain large numerical values incorrectly formatted as dates, while Excel opens the file normally and just displays an error in the given cells.

It would be good to reflect this behaviour by returning a numerical value instead, or a special value signalling an error.

Comments (9)

  1. CharlieC

    Can you provide an example? Dates and times are a problem in Excel but I don't really think it's openpyxl's job to resolve ambiguities and rejecting the file as invalid is a reasonable response.

  2. Jan Simbera reporter

    See the attachment. There is a large numeric value in A1 formatted as date which makes the traversal of sheet's rows fail, giving the above error.

  3. CharlieC

    Thanks for the file but as it effectively contains nonsense I think it's the responsibility of whatever created it to fix it. The specification clearly sets limits on dates: The earliest date permitted is 0001-01-01, 00:00. The latest date permitted is 9999-12-31, 23:59:59.999. and the supplied value is way outside which makes the file invalid.

    Adding workarounds for this to library means that, while you'd personally be happy for the conversion to the float or an error value, others might consider it a silent error, leading to other problems further down the line.

  4. Jan Simbera reporter

    Thanks for the clarification. If this is the wording of the Excel file specification, I more than understand the silent error avoidance. If it is the Python specification, I would still appreciate an option to avoid the exception (and go silent or something like this), as this "nonsense" is commonly produced by numerous accounting software reporting outputs that rely on Excel not failing in this case.

  5. CharlieC

    Were this to be handled in openpyxl converting the cell to an error would be the only sensible thing to do. But then you wouldn't find out the problem until you read the individual cell, which would make tracking down the problem a lot harder.

    Excel's handling of dates is just poor: it doesn't even implement the specification correctly so anything before 1900 is not representable and it also believes in the date 0th January 1900. Elsewhere it will remove stuff it doesn't like and provide fairly cryptic explanations. I prefer clear warnings and exceptions when something doesn't work.

  6. Log in to comment