Unable to handle Excel file with maximum rows

Issue #668 invalid
Ry iTunes created an issue

Opening a workbook using openpyxl.load_workbook() does not work as expected when the Excel workbook as a worksheet with the maximum number of rows allowed in Excel 2013: 1,048,576

When the worksheet has the maximum number of rows, the workbook instance reports .max_row == 1 and .max_column == 1 and it is not possible to iterate over the rows/columns and get cell values.

If one deletes one line on the Excel worksheet (by some means other than openpyxl), and worksheet is down to 1,048,575 lines, openpyxl.load_workbook() works as expected, and max_row and max_column report the correct number of rows and columns.

Comments (11)

  1. Ry iTunes reporter

    I am using read-only mode:

    import openpyxl

    src_wkbook = openpyxl.load_workbook(srcfile, read_only=True)
    
    for src_wksheet in src_wkbook:
        rown = src_wksheet.max_row
        coln = src_wksheet.max_column
        print("Sheet rows x columns " + str(rown) + " x " + str(coln))
    
  2. CharlieC

    Then there's not a lot we can do. Worksheet files contain optional values about the size of the worksheet and the start of the file. Looks like these are incorrect in your file. You can force openpyxl to recalculate these from the worksheet but that requires openpyxl to parse the whole worksheet, which is against the principle of the read-only mode.

    So, the problem is really with whichever program or library created the file in the first place.

  3. Ry iTunes reporter

    After all, the problem is not caused by an Excel worksheet with maximum rows but possibly with some strange Excel character or format. Just opening the Excel workbook that is causing the problem, and saving it (without changing anything), makes the problem go away. The Excel workbook that is causing the problem has 100MB (71MB if compressed) so it is not practical to upload the file here.

  4. CharlieC

    So you can confirm that the file wasn't created by Excel? There are lots of places such as Google Drive where you can temporarily upload the file.

    If you look at the source of any of the sheet.xml files in the package, you'll see that the problem is with the <dimension> tag at the start.

    You could also just extract one of these locally and upload a zipped copy of it.

  5. Ry iTunes reporter

    I do not know the origin of the Excel file (I do not know if it was generated by Excel or if it was originally created using a different program and then converted to Excel). However, I do know that it is an Excel workbook that is supposed to be compatible both with Microsoft Windows and Mac OS. I'm using Microsoft Windows Office 2013 and the workbook opens without any problem. The problem is only detected using openpyxl. But once I save the file with Excel 2013 (without any change on the workbook other than the saving process) the problem does not occur again using openpyxl. The workbook does not have any XML data so I am unable to export any XML information.

  6. CharlieC

    Excel loads the whole file into memory, as does the standard mode in openpyxl. But without a sample file or at least the worksheets in it we cannot investigate this further.

  7. CharlieC

    The bug is probably in whatever application created the Excel file for including erroneous worksheet dimension information.

  8. Log in to comment