Inefficiency in MultiCellRange

Issue #1087 on hold
Jiahai Feng created an issue

I was working with a large excel file with many rows of merged cells when I noticed that the performance was really bad. After using a profiler, I identified a possible issue with the implementation of MultiCellRange.contains in

Each query of contains is O(n) in number of ranges, which is really slow. I propose that a more efficient data structure be used, like a quad tree or a simpler set-based solution.

I would love to contribute code but I'm new to open source so I don't know how the process works. Let me know if I can help.

Comments (2)

  1. CharlieC

    Thanks for the report. The implementation isn't optimal but it's also not something I'd spend a lot of time trying to improve: checking membership isn't a common occurrence but it already uses a set approach for the underlying cell ranges so that it's O(n) for an optimised O (constant memory).

    Contributions, however, are always welcome. There is a chapter in the documentation dedicated to this.

  2. CharlieC

    Feel free to submit a PR on this if you think it's a problem but I suspect the current implementation is fast enough.

  3. Log in to comment