33% time spent in IndexedList.add()

Issue #1208 resolved
Ian Ward created an issue

Hello and thank you for all your work on this library! I've recently upgraded to 2.5.14 and I'm running some code that generates a 10k-line excel template for data entry and I've found that 33% of my run time is spent in IndexedList.add. I wonder if there is a problem with IndexedList or the way I'm using openpyxl.

Profiling I see most of my calls that trigger IndexedList.add come from an inner loop that styles each cell used to enter data:


        xl_format = datastore_type[field['datastore_type']].xl_format
        alignment = openpyxl.styles.Alignment(wrap_text=True)
        protection = openpyxl.styles.Protection(locked=False)
        for (c,) in sheet[validation_range]:
            c.number_format = xl_format
            c.alignment = alignment
            c.protection = protection

where validation_range is a whole column of 10000 cells for entering data.

Is there a better way to accomplish the same thing, or is there something we can do to reduce the amount of work required by the IndexedList.add operation?

Comments (4)

  1. CharlieC

    Styles are implemented using pointers (to reduce memory use, presumably) so every time you assign a format there is a check to see whether it is already in the workbook. As you're noting this can get quite slow as you're seeing.

    The way around this is to use NamedStyles which use a much simpler assignment.

  2. Ian Ward reporter

    I switched to NamedSyle and was surprised when it was even slower. This change made my time jump from 21s in add to 114s in the __set__ method (add no longer in the profile)


            xl_format = datastore_type[field['datastore_type']].xl_format
            alignment = openpyxl.styles.Alignment(wrap_text=True)
            col_style = NamedStyle(
                name='reco_{0}{1}'.format(resource_num, col_letter),
            for (c,) in sheet[validation_range]:
                c.style = col_style

    But once I changed c.style = col_style to c.style = col_style.name the time dropped down to just 5s in __set__. This is down to 10% of my overall run time so that's a pretty big improvement.

    Thank you for your help.

  3. Log in to comment