Cannot read xlsx files with non standard internal sheet.xml names

Issue #270 resolved
Max Böhm created an issue

openpyxl cannot read worksheets when the internal xml data files do not follow the naming convention sheet1.xml, sheet2.xml, ...

The [Content_Types].xml file contains the list of actually used xml data sheetnames. The sheetnames should be taken from here.

A patch of reader/ is attached (based on the 1.8.3 version).

Comments (9)

  1. CharlieC

    @max22 Can you try the 1.9 branch? It has much better code for identifying sheet names which can be backported if really necessary.

  2. Max Böhm reporter

    I have attached the sample input "Request.xlsx" file for which this Issue #270 occurs.

    The file contains two worksheets. The first one is named sheet.xml while the second one is named sheet2.xml.

    This file is also a sample input for the Issue #269. (the problem in read_dimension())

  3. Max Böhm reporter

    these are the modifications of the proposed patch for

    $ diff
    <     worksheet_names = [worksheet for worksheet, sheet_type in zip(sheet_names, sheet_types) if sheet_type[1] == VALID_WORKSHEET]
    <     for i, sheet_name in enumerate(worksheet_names):
    >     worksheet_names = [(worksheet, sheet_type[0]) for worksheet, sheet_type in zip(sheet_names, sheet_types) if sheet_type[1] == VALID_WORKSHEET]
    >     for i, (sheet_name, sheet) in enumerate(worksheet_names):
    <         sheet_codename = 'sheet%d.xml' % (i + 1)
    >         # sheet_codename = 'sheet%d.xml' % (i + 1)
    >         sheet_codename = sheet.split('/')[-1]
  4. CharlieC

    1.9 can handled the worksheets properly but is tripped up by the styles which has a bogus id. I'll add a workaround for that but the problem should be reported to whatever tool is generating this.

  5. CharlieC

    There are a lot of problems with the files you supplied. While I think we can handle the sheetnames and weird id for the styles - this looks like it's irrelevant as there can only be only styles object - the cells will not be correctly parsed as they are all set as formulae. That the worksheet is unsized is the least of the problems!

    From the ECMA 376 spec p.2433 "str (String): Cell containing a formula string."

    This should be reported upstream to the tool you're using.

  6. Max Böhm reporter

    Thanks for sharing your findings. I'll report that upstream (Aries/Aldea is the product which generated these reports). Excel seems to be quite tolerant (it reads those files without any complaints)...

  7. Log in to comment