Save of huge workbook/worksheet yields MemoryError
I have a write-only workbook with two sheets. The first one is huge, with 2201 columns and 6630 rows. The second sheet is small enough not to be an issue.
Filling the workbook works fine; it all fits. Then I want to save it. I traced it down to the statement "out = src.read()" in method write() on line 254 in write_only.py. Apparently it wants to slurp each worksheet's whole xml file in one go -- over 400Mb in my case -- and it crashes out with a BEX in the vc++ DLL. My suggestion would be for _write to deliver the file handle instead of the entire, read-out string to the caller. I know that at some point the ZipFile is going to make the xml data a _LOT smaller, so that so much memory will not be needed at once. However, I don't know enough about the library to recognize all the implications of just passing the file handle back up the call stack.
Comments (10)
-
-
Hi Charlie,
Thanks for the tip. I'll try that tomorrow and get back to you.
I concur with
#667that I would have already tried it and not be bugging you if it were mentioned.Regards, Freundliche Grüße, Jim Walker, in Eschborn
-
We do accept PRs for documentation… Basically I didn't know at the time we switched that our own xmlfile method didn't stream properly.
-
I found that we already had lxml on our installation and it was being used. There was no problem converting huge sheets to XML, as they were being written to temp files.
I traced a bit deeper into the execution flow and found that ZipFile.writestr() was being called, where it seems that ZipFile.write() could save a lot of memory in the case where you've already got a temp file containing the (sometimes huge) XML generated earlier.
I realize that what I'm suggesting would take more investigative effort that I've had time for -- I've only been using openpyxl since Friday.
-
Never come across that before but always welcome to suggestions especially if they come in the form of PRs! I think writestr() makes it easier to set the filename of the object.
-
It looks like I can get some errors like this on Windows with the file from
#494. Can you try reading and writing this file and provide me with the full traceback? Alternatively, can you provide a sample file? I still don't think the file size is too much of a problem compared to say holding relevant worksheet in memory. -
Sorry, I've put some time into this but cannot replicate a memory issue for something like this. The following code never goes above about 11 MB in my Windows VM.
from openpyxl import Workbook row = [1]*2201 wb = Workbook(write_only=True) ws = wb.create_sheet() for r in range(6630): ws.append(row) wb.save("big.xlsx")
-
- changed status to on hold
Additional information required.
-
- removed component
Removing component: writer (automated comment)
-
- removed version
Removing version: 2.4.x (automated comment)
- Log in to comment
Install lxml if you haven't already.