xfId mapping broken in cellXfs

Issue #647 resolved
Charlie Clark
created an issue

From #642 cellXfs[48] seems to index a non-extant base style. This is carried over from the original file but openpyxl normalises and rebases the pointers.

<x:xf numFmtId="165" fontId="13" fillId="3" borderId="30" xfId="19" quotePrefix="0" pivotButton="0" applyAlignment="1" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <x:alignment horizontal="center" vertical="center" wrapText="1" />

Relevant record from source (from LibreOffice)

 <cellStyles count="6">
  <cellStyle name="Normal" xfId="0" builtinId="0" customBuiltin="false"/>
  <cellStyle name="Comma" xfId="15" builtinId="3" customBuiltin="false"/>
  <cellStyle name="Comma [0]" xfId="16" builtinId="6" customBuiltin="false"/>
  <cellStyle name="Currency" xfId="17" builtinId="4" customBuiltin="false"/>
  <cellStyle name="Currency [0]" xfId="18" builtinId="7" customBuiltin="false"/>
  <cellStyle name="Percent" xfId="19" builtinId="5" customBuiltin="false"/>

And what openpyxl makes of it

 <cellStyles count="6">
  <cellStyle name="Normal" xfId="0" builtinId="0"/>
  <cellStyle name="Comma" xfId="1" builtinId="3"/>
  <cellStyle name="Comma [0]" xfId="2" builtinId="6"/>
  <cellStyle name="Currency" xfId="3" builtinId="4"/>
  <cellStyle name="Currency [0]" xfId="4" builtinId="7"/>
  <cellStyle name="Percent" xfId="5" builtinId="5"/>

Comments (11)

  1. Matthew Lemon

    I am getting sporadic errors when trying to load an existing workbook (my script needs to be able to open an xlsx file, manipulate it then resave it). I've had some success doing this but for some reason, I am now getting in IndexError whenever I try to load the file.

    My code:

    from openpyxl import load_workbook
    wb = load_workbook('/home/lemon/Documents/test_excel.xlsx')

    The error:

    /home/lemon/.virtualenvs/bicc_excel/bin/python /home/lemon/code/python/bicc_excel/openpyxl_test.py
    Traceback (most recent call last):
      File "/home/lemon/code/python/bicc_excel/openpyxl_test.py", line 3, in <module>
        wb = load_workbook('/home/lemon/Documents/test_excel.xlsx')
      File "/home/lemon/.virtualenvs/bicc_excel/lib/python3.5/site-packages/openpyxl/reader/excel.py", line 201, in load_workbook
        apply_stylesheet(archive, wb) # bind styles to workbook
      File "/home/lemon/.virtualenvs/bicc_excel/lib/python3.5/site-packages/openpyxl/styles/stylesheet.py", line 164, in apply_stylesheet
        stylesheet = Stylesheet.from_tree(node)
      File "/home/lemon/.virtualenvs/bicc_excel/lib/python3.5/site-packages/openpyxl/styles/stylesheet.py", line 99, in from_tree
        return  super(Stylesheet, cls).from_tree(node)
      File "/home/lemon/.virtualenvs/bicc_excel/lib/python3.5/site-packages/openpyxl/descriptors/serialisable.py", line 89, in from_tree
        return cls(**attrib)
      File "/home/lemon/.virtualenvs/bicc_excel/lib/python3.5/site-packages/openpyxl/styles/stylesheet.py", line 90, in __init__
        self.named_styles =  self._merge_named_styles()
      File "/home/lemon/.virtualenvs/bicc_excel/lib/python3.5/site-packages/openpyxl/styles/stylesheet.py", line 110, in _merge_named_styles
        xf = self.cellStyleXfs[style.xfId]
      File "/home/lemon/.virtualenvs/bicc_excel/lib/python3.5/site-packages/openpyxl/styles/cell_style.py", line 182, in __getitem__
        return self.xf[idx]
    IndexError: list index out of range

    I'm running this on a Debian 8 machine. I have been using Libreoffice to create the original file as I don't have access to MS Excel. Obviously, the original file is saved as xlsx with Libreoffice. As I said, I have been able to load the file, add data to the sheet and resave it, but more often that not, I am getting this error. Python 3.5.2.

    Postscript: I understand in conversation with you that this is probably a bug in Libreoffice which you have already raised. Workaround will be to use openpyxl with Excel files untouched by Libreoffice.

  2. Koert van der Veer

    I would propose to reopen this bug, as I don't think it is a bug in LibreOffice. IMHO, Openpyxl should never write a workbook that it can't open itself.

    LibreOffice writes a XLSX that follows the spec: my trivial example workbook contains 20 cellStyleXfs entries. The cellStyles references xfIds #0, #15, #16, #17, #18 and #19. Openpyxl loads this workbook just fine. When Openpyxl writes this workbook, however, the cellStyleXfs list is trimmed: there are just 6 entries left. However, it still writes cellStyles referencing the same xfIds. This causes the crash in the initial report.

    I presently believe the root cause lies in _split_named_styles, which builds a new list of xfs, but does not rewrite the xfId in Style. I'm still investigating, and write a patch with at least a regression test (not sure if I'm capable of writing a patch, though)

  3. Charlie Clark reporter

    Feel free to dig in but you'll find the original files won't open in Excel either. I discussed the way xfId are supposed to work with the OOXML WG and they are supposed to be index pointers of a list. It's a shitty spec because this cannot be enforced in the schema but the LO team really ought to spend some more time reading it.

  4. Koert van der Veer

    Feel free to dig in but you'll find the original files won't open in Excel either. ... The LO team really ought to spend some more time reading it.

    We may not be looking at the same issue here. My issue is that a file generated by LO can be opened by Excel and Openpyxl just fine. It is after openpyxl wrote out the file, that the file became un-openable (by LO, excel and openpyxl). Ergo: Openpyxl writes a corrupt file. This is the behaviour described by most of the duplicates of this bug, but possibly not the original bug.

    I've found the root cause, it is indeed that the cellStyleXfs list is rebuilt, but the cellStyle's xfID are not reindexed. Pullrequest !146 fixes this, and adds a regression test.

  5. Charlie Clark reporter

    @Koert van der Veer you might want to look at the source of some of the duplicates of this which are not resolved by either fix because the lookup error occurs when opening the file but it's basically the same issue. Looks like LO is referring to the index of cellXfs and not cellStyleXfs

  6. Log in to comment