Save of huge workbook/worksheet yields MemoryError

Issue #745 on hold
walkerjim
created an issue

I have a write-only workbook with two sheets. The first one is huge, with 2201 columns and 6630 rows. The second sheet is small enough not to be an issue.

Filling the workbook works fine; it all fits. Then I want to save it. I traced it down to the statement "out = src.read()" in method write() on line 254 in write_only.py. Apparently it wants to slurp each worksheet's whole xml file in one go -- over 400Mb in my case -- and it crashes out with a BEX in the vc++ DLL. My suggestion would be for _write to deliver the file handle instead of the entire, read-out string to the caller. I know that at some point the ZipFile is going to make the xml data a _LOT smaller, so that so much memory will not be needed at once. However, I don't know enough about the library to recognize all the implications of just passing the file handle back up the call stack.

Comments (10)

  1. walkerjim reporter

    Hi Charlie,

    Thanks for the tip. I'll try that tomorrow and get back to you.

    I concur with #667 that I would have already tried it and not be bugging you if it were mentioned.

    Regards, Freundliche Grüße, Jim Walker, in Eschborn


  2. walkerjim reporter

    I found that we already had lxml on our installation and it was being used. There was no problem converting huge sheets to XML, as they were being written to temp files.

    I traced a bit deeper into the execution flow and found that ZipFile.writestr() was being called, where it seems that ZipFile.write() could save a lot of memory in the case where you've already got a temp file containing the (sometimes huge) XML generated earlier.

    I realize that what I'm suggesting would take more investigative effort that I've had time for -- I've only been using openpyxl since Friday.

  3. CharlieC

    Never come across that before but always welcome to suggestions especially if they come in the form of PRs! I think writestr() makes it easier to set the filename of the object.

  4. CharlieC

    It looks like I can get some errors like this on Windows with the file from #494. Can you try reading and writing this file and provide me with the full traceback? Alternatively, can you provide a sample file? I still don't think the file size is too much of a problem compared to say holding relevant worksheet in memory.

  5. CharlieC

    Sorry, I've put some time into this but cannot replicate a memory issue for something like this. The following code never goes above about 11 MB in my Windows VM.

    from openpyxl import Workbook
    
    row = [1]*2201
    
    wb = Workbook(write_only=True)
    ws = wb.create_sheet()
    for r in range(6630):
        ws.append(row)
    wb.save("big.xlsx")
    
  6. Log in to comment