We are experiencing high memory usage when creating an Excel file with high number of rows (anything from 10000 to 1million+ rows).
Have attached a test script which reproduces the problem on our machines. It performs the following:
- Generates data
- Creates an openpyxl
- Inserts the data into a worksheet
- Saves the workbook
python test.py --write_only 10000 10 will generate 10000 rows with 10 columns each and use
--help for more details. If running without
mprof comment out the
Python version: 2.7
Openpyxl version: 2.4.9
OS: Ubuntu 17.10
→ dpkg --list |awk '/ii/ && /libxml/' ii libxml-parser-perl 2.44-2build3 amd64 Perl module for parsing XML files ii libxml-twig-perl 1:3.50-1 all Perl module for processing huge XML documents in tree mode ii libxml-xpathengine-perl 0.13-1 all re-usable XPath engine for DOM-like trees ii libxml2:amd64 2.9.4+dfsg1-4ubuntu1.2 amd64 GNOME XML library ii libxml2-dev:amd64 2.9.4+dfsg1-4ubuntu1.2 amd64 Development files for the GNOME XML library ii libxml2-utils 2.9.4+dfsg1-4ubuntu1.2 amd64 XML utilities ii libxmlsec1:amd64 1.2.24-4 amd64 XML security library ii libxmlsec1-nss:amd64 1.2.24-4 amd64 Nss engine for the XML security library ii python-lxml 4.0.0-1 amd64 pythonic binding for the libxml2 and libxslt libraries
lxml.etree.LXML_VERSION = (3, 4, 0, 1) lxml.etree.LIBXML_VERSION = (2, 9, 4) lxml.etree.LIBXML_COMPILED_VERSION = (2, 9, 4) openpyxl.LXML = True
We can then use
memory-profiler to gather memory usage (
mprof run test.py --write_only 10000 10) and generate graphs (
mprof plot), some of which are also attached.
Workbook(write_only=True) does seem to help with memory usage but when we call
wb.save() the memory increases dramatically.
There is potential we are using openpyxl incorrectly, or there is some other dependency we are missing. In production, we are using
lxml-cffi but it exhibits the same behaviour as CPython and
The test script generates random strings. In reality, we create Excel files with a mixture of strings, dates and numbers.