Trying to load a non-excel file should raise InvalidExcelFileException, not KeyError

Issue #1061 new
Nick Pellegrino created an issue

Please see the below reproduction steps. I would expect these steps to raise an InvalidFileException as in 2.2.5, but in 2.5.3 it raises a KeyError instead.

$ import openpyxl

$ filename = 'not_excel_file.xlsx'

$ openpyxl.load_workbook(filename)


KeyError Traceback (most recent call last)
<ipython-input-3-ee72c5387958> in <module>()
----> 1 openpyxl.load_workbook(filename)

/Users/Nick/.virtualenvs/feb_2018/lib/python2.7/site-packages/openpyxl/reader/excel.pyc in load_workbook(filename, read_only, keep_vba, data_only, guess_types, keep_links)
174 archive = _validate_archive(filename)
175
--> 176 src = archive.read(ARC_CONTENT_TYPES)
177 root = fromstring(src)
178 package = Manifest.from_tree(root)

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.pyc in read(self, name, pwd)
933 def read(self, name, pwd=None):
934 """Return file bytes (as a string) for name."""
--> 935 return self.open(name, "r", pwd).read()
936
937 def open(self, name, mode="r", pwd=None):

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.pyc in open(self, name, mode, pwd)
959 else:
960 # Get info object for name
--> 961 zinfo = self.getinfo(name)
962
963 zef_file.seek(zinfo.header_offset, 0)

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.pyc in getinfo(self, name)
907 if info is None:
908 raise KeyError(
--> 909 'There is no item named %r in the archive' % name)
910
911 return info

KeyError: "There is no item named '[Content_Types].xml' in the archive"

Comments (3)

  1. CharlieC

    I'm not sure if the expectation is entirely reasonable. In 2.2 the exception would cascade up the stack if the code had any kind of problems with the file format. The code has since been refactored to be more granular but the basic heuristics are the same: we only check the file extension and that it's a zipfile. So files such as this will always fool openpyxl.

    But I'm not sure if it's really up to the library to perform a full check that the file is indeed a valid OPC package. It might make sense in this case to raise an exception but I've yet to come across a real file like this in the wild.

  2. Log in to comment