year is out of range - Error

Issue #1152 resolved
Muhammad Asif Javed
created an issue

While loading an excel file it is throwing year out of range error. I don't have idea from which cell it is throwing. As my excel file is very large and have alot of data.

excelWorkbook = openpyxl.load_workbook(xlsxUri) File "/usr/local/lib/python2.7/dist-packages/openpyxl/reader/excel.py", line 249, in load_workbook ws_parser.parse() File "/usr/local/lib/python2.7/dist-packages/openpyxl/reader/worksheet.py", line 130, in parse dispatcher[tag_name](element) File "/usr/local/lib/python2.7/dist-packages/openpyxl/reader/worksheet.py", line 292, in parse_row self.parse_cell(cell) File "/usr/local/lib/python2.7/dist-packages/openpyxl/reader/worksheet.py", line 226, in parse_cell value = from_excel(value, self.epoch) File "/usr/local/lib/python2.7/dist-packages/openpyxl/utils/datetime.py", line 97, in from_excel return datetime.datetime(*parts[:3]) + diff ValueError: year is out of range

Comments (9)

  1. CharlieC

    You might have more luck in read-only mode. But you should be able to write a script that opens the file for you and allows you to use the debugger to find out the value. That's what I do with WingIDE but basically you can wrap load_workbook() in the debugger. You'll want to read up on how to do this.

  2. Jeff “Señor Geek” Raber

    I ran into the same issue. Opening the file as read-only helps, but only until you try to access the value of the/an offending cell. Accessing the value gives the exception 'ValueError: year XXXXXXXX is out of range'. I was able to catch this exception and return the internal_value instead.

    Excel will allow you to store an integer in a cell formatted as Date. If that value is negative or too large, Excel will be display as a string of octothorpes (ex: '#########'). Hovering the mouse over a cell in that state gives a tool-tip message that says "Dates and times that are negative or too large show as ######." From a quick test in Excel 2016, it appears that the largest integer value that will not cause this behavior is 2958465 (December 31, 9999). The also happens to also be the largest date that the python module datetime will accept, anything larger returns the ValueError exception.

    I think that openpyxl should be improved to allow us to open files with this particular defect.

    One solution could be to handle the ValueError exception in the parse_cell method. If the value cannot be converted to a date, then the data_type can stay as 'n' and the value will remain unconverted. I'm not sure if this would have any side-effects (what happens when the file is saved? Would the cell format be changed from Date?)

    Another option would be to check that the value does not exceed 2958465. If it does, don't bother trying to convert to a date.

  3. CharlieC

    Excel uses octothorpes for all kinds of things such as values too wide for a column. When it comes to dates it will use them for anything before 1900. This is not good practice to follow.

    However, when it comes to datetimes, these are covered by the specification:
    The earliest date permitted is 0001-01-01, 00:00. The latest date permitted is 9999-12-31, 23:59:59.999.
    So a ValueError is a perfectly reasonable exception. This could perhaps be wrapped but it is up to the user to provide correct values and not up to the library to provide workarounds.

  4. Jeff “Señor Geek” Raber

    I like that this allows us to open a file in Read+Write mode despite the errant values. The warning is good, though I think that it would be better to include the worksheet name in the warning, to make finding the errant values easier when there are many worksheets.

    This solution, however, will result in data-loss when a file (with errant date values) is opened and then saved. The original cell values will be lost, replaced with "#VALUE!". This seems like it could be a big 'Gotcha'. The warning should probably mention the potential of data-loss.

    I propose that the value just remain as numeric, without any conversion to datetime. The values in the source XLSX file have a numeric data-type (t="n") and the values are valid numeric values. The cell's number_format should only affect how the value is displayed/output, not how it is saved.

    I'll try to submit a PR later tonight or tomorrow

  5. Jeff “Señor Geek” Raber

    I did submit a Pull Request (PR 312), but it was not accepted. It is CharlieC's opinion that the value in those cells are junk and should not be preserved. (I disagree, but my argument did not convince CharlieC.)

    I will maintain my repo here for a little while (for as long as it is useful for me). You can see the changes that I applied here:
    I applied the change to 2.6 and then backported it to 2.5

  6. Log in to comment