Google Sheets pivot table causes openpyxl crash

Issue #1175 resolved
Ben Webb
created an issue

Here's a Google Sheet with a pivot table https://docs.google.com/spreadsheets/d/1RZXcv7XdwL1Z2DTDoduIl6uBJE6OjqMSNXXou6wbGc8/edit#gid=0

Here's the XLSX download link: https://docs.google.com/a/opendataservices.coop/spreadsheets/d/1RZXcv7XdwL1Z2DTDoduIl6uBJE6OjqMSNXXou6wbGc8/export?format=xlsx This file is also attached to the issue

Here's what happens when I load the file with openpyxl:

Python 3.6.7 (default, Oct 22 2018, 11:32:17) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from openpyxl import load_workbook
>>> wb = load_workbook('Reproduce openpyxl pivot table issue.xlsx')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bjwebb/opendataservices/openpyxl/openpyxl/reader/excel.py", line 312, in load_workbook
    reader.read()
  File "/home/bjwebb/opendataservices/openpyxl/openpyxl/reader/excel.py", line 274, in read
    self.read_worksheets()
  File "/home/bjwebb/opendataservices/openpyxl/openpyxl/reader/excel.py", line 261, in read_worksheets
    pivot.cache = self.parser.pivot_caches[pivot.cacheId]
  File "/home/bjwebb/opendataservices/openpyxl/openpyxl/reader/workbook.py", line 122, in pivot_caches
    cache = get_rel(self.archive, self.rels, id=c.id, cls=CacheDefinition)
  File "/home/bjwebb/opendataservices/openpyxl/openpyxl/packaging/relationship.py", line 165, in get_rel
    obj.deps = get_dependents(archive, rels_path)
  File "/home/bjwebb/opendataservices/openpyxl/openpyxl/packaging/relationship.py", line 130, in get_dependents
    rels = RelationshipList.from_tree(node)
  File "/home/bjwebb/opendataservices/openpyxl/openpyxl/descriptors/serialisable.py", line 88, in from_tree
    obj = desc.expected_type.from_tree(el)
  File "/home/bjwebb/opendataservices/openpyxl/openpyxl/descriptors/serialisable.py", line 104, in from_tree
    return cls(**attrib)
  File "/home/bjwebb/opendataservices/openpyxl/openpyxl/packaging/relationship.py", line 50, in __init__
    self.Target = Target
  File "/home/bjwebb/opendataservices/openpyxl/openpyxl/descriptors/base.py", line 44, in __set__
    raise TypeError('expected ' + str(self.expected_type))
TypeError: expected <class 'str'>

Comments (2)

  1. CharlieC

    Hi Ben and thanks for the report. As far as I can tell the contains an invalid relationship in the snappily named pivotCacheDefinition1.xml.rels. <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/externalLinkPath" TargetMode="External"/> The schema for this is defined as

    <xsd:complexType name="CT_Relationship">
        <xsd:simpleContent>
          <xsd:extension base="xsd:string">
            <xsd:attribute name="TargetMode" type="ST_TargetMode" use="optional"/>
            <xsd:attribute name="Target" type="xsd:anyURI" use="required"/>
            <xsd:attribute name="Type" type="xsd:anyURI" use="required"/>
            <xsd:attribute name="Id" type="xsd:ID" use="required"/>
          </xsd:extension>
        </xsd:simpleContent>
      </xsd:complexType>
    

    So there required Target attribute is missing. You get confirmation for this if you try and read the file with the OOXML Productivity Tool.

    I can probably add a workaround for this because the cacheRecords themselves are optional but it really ought to be reported to Google as a bug.

  2. Log in to comment