File opens slowly, uses 1.5GB of memory

Issue #860 resolved
James Knight
created an issue

I'm trying to open (for reading and writing) a small 90kb Excel file, that Excel opens immediately, but openpyxl takes 1-2 minutes and uses 1.5GB to open. It looks like the file contains row and/or column formatting, and that is triggering openpyxl to replicate an enormous number of cell formats (even though none of the six sheets in the file have more than 100 rows or columns).

It is a submission file for a public database, so I'm not able to redo the structure. Is there a way to get to a file that doesn't trigger this issue (I've already tried and Excel format cleaner), and/or could you see if openpyxl could read it quickly?

This is happening with version 2.4.8.

Comments (5)

  1. CharlieC

    Thanks for the report. Looks like the file does contain far too many CellStyles. We've had problems with these in the past though I thought I'd removed the main bottleneck. Will have to investigate a bit more it seems.

  2. CharlieC

    Actually, a look at the profiling suggests that the problem could be related to the data validations. Excel uses row 1048576 internally to suggest all cells in a particular column and openpyxl expands the ranges into individual cell coordinates. Wouldn't expect this use that much memory but it does add up to a lot of strings.

