Inefficiency in MultiCellRange

Issue #1087 on hold
Jiahai Feng
created an issue

I was working with a large excel file with many rows of merged cells when I noticed that the performance was really bad. After using a profiler, I identified a possible issue with the implementation of MultiCellRange.contains in worksheet.cell_range.py

Each query of contains is O(n) in number of ranges, which is really slow. I propose that a more efficient data structure be used, like a quad tree or a simpler set-based solution.

I would love to contribute code but I'm new to open source so I don't know how the process works. Let me know if I can help.

Comments (2)

  1. CharlieC

    Thanks for the report. The implementation isn't optimal but it's also not something I'd spend a lot of time trying to improve: checking membership isn't a common occurrence but it already uses a set approach for the underlying cell ranges so that it's O(n) for an optimised O (constant memory).

    Contributions, however, are always welcome. There is a chapter in the documentation dedicated to this.

  2. Log in to comment