Redefining __slots__ in subclasses

Issue #475 resolved
Felipe
created an issue

In openpyxl.cell.cell, the Cell class is defined as having

__slots__ = StyleableObject.__slots__ + ( # more stuff

Since Cell is a subclass of StyleableObject this results in undefined behavior, and may be invalid code in the future. Per the Python docs:

If a class defines a slot also defined in a base class, the instance variable defined by the base class slot is inaccessible (except by retrieving its descriptor directly from the base class). This renders the meaning of the program undefined. In the future, a check may be added to prevent this.

Additionally, according to this SO answer, the redundant slots declaration may be wasting memory.

I think the fix is as easy as removing StyleableObject.__slots__ + from the slot declaration

Comments (5)

  1. CharlieC

    Thanks for the note on this but I don't really see it as a bug. Certainly, this bit of code is a bit hairy and can no doubt be improved. It certainly doesn't use more memory as this has been tested. The StyleableObject is more of a mixin than a base class.

  2. Felipe reporter

    How have you tested the memory consumption? I just did some basic pympler computations and this is what I got:

    import openpyxl
    from pympler import asizeof
    wb = openpyxl.load_workbook('test.xlsx')
    print(asizeof.asizeof(wb))
    # prints 42536 when using the current cell.py
    # prints 42056 when using the patched cell.py
    

    The workbook in question creates 12 cell objects (also verified with muppy), which suggests to me an excess of 40 bytes per Cell object. I'm not super-fussed about 40 bytes per cell, but I disagree that this is not a bug. The documentation clearly describes this behavior (declaring slots in both a parent class and a child class) as creating an "undefined" program. Further, as you can see from the below patch, it's an easy fix (all tests still pass after applying it). Fixing the issue will make openpyxl more robust to future changes that could break this behavior. I'm happy to submit a pull request if you like, but this seemed easy enough to just fix directly.


    Here's the patch that fixes the issue

    --- a/openpyxl/cell/cell.py Mon Jun 15 12:07:33 2015 -0500
    +++ b/openpyxl/cell/cell.py Mon Jun 15 12:38:33 2015 -0500
    @@ -77,7 +77,7 @@
         Properties of interest include style, type, value, and address.
    
         """
    -    __slots__ =  StyleableObject.__slots__ + (
    +    __slots__ =  (
             'row',
             'col_idx',
             '_value',
    
  3. CharlieC

    I've given the change a spin and I'm not averse to it. Not 100% convinced that we have the necessary tests for this but the doctests, which occasionally pick up outliers also pass.

    I'd love to be able to reduce memory use significantly so any tips along those lines are welcome. This particular monster came into quash another (overriding getattr) but I hope this is a kind of reverse "swallow a fly". Maybe a bat has just swallowed the cat? ;-)

    We certainly don't have enough tools profiling memory.

  4. Log in to comment