import of numpy can cause huge memory usage increase on Windows

Issue #1272 wontfix
schlenk created an issue

The pypi version of numpy is bundled with the openblas library, which allocates some pretty huge (30 MB/CPU core) buffers on import of the library.

So any code that imports openpyxl in a multiprocessing environment where numpy is available might cause massive increases of committed memory usage. (e.g. for a 28-Core Xeon CPU, you gain around 750 MB committed memory per process, multiplied by the usual multiprocessing spawn num cpu subprocesses, this just eats 22 GB of RAM.)

I reported the issue for numpy at: https://github.com/numpy/numpy/issues/13432

But it would be nice, if openpyxl was a bit more careful when importing numpy for its very limited usage.

Comments (3)

  1. CharlieC

    I don’t really think this has much to do with Numpy or openpyxl but with the downstream library.

  2. schlenk reporter

    As root cause, yes, it is openblas fault.

    But the assumption openpyxl makes (just import numpy, it is cheap and we need the datatypes numpy.bool_, numpy.floating, numpy.integer or numpy.datetime64) is probably wrong in many installations.

    And paying a price for stuff that is not actually used is always not so nice, even if it is just the normal import cost of numpy (which is 40 MB, mostly shared dll).

    So if there was some:

    openpyxl.compat.init_numpy() call for people that need it, fine. But just importing it because it is available in the environment is a bit too helpful.

  3. Log in to comment