deleting sheet results in unreadable workbook

Issue #811 resolved
Ryan Hutchison
created an issue

Hello,

Firstly, thank you for your work on this module, great utility! My issue seems strikingly similar to #446. Currently I am using the following loop to pull sheets from the template that have not been populated:

for sheet in self.wb.worksheets:
    if sheet['B6'].value is None:
        del self.wb[sheet.title]

When attempting to open the workbook, I get a "content not readable" error. Reviewing the log reveals that a named range was removed to repair the file:

<removedFeature>Removed Feature: Named range from /xl/workbook.xml (Workbook)</removedFeature></removedFeatures></recoveryLog>

Looking at the referenced issue, it looks as though named ranges may have been the cause. In inspecting the archive, it looks as though some named ranges were indeed left behind after the delete:

<workbook 
    xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
    <workbookPr codeName="ThisWorkbook"/>
    <bookViews>
        <workbookView/>
    </bookViews>
    <sheets>
        <sheet 
            xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" name="IAD" sheetId="1" state="visible" r:id="rId1"/>
    </sheets>
    <definedNames>
        <definedName localSheetId="0" name="DISCOUNT">DFW!$G$39:$G$45</definedName>
        <definedName localSheetId="1" name="DISCOUNT">IAD!$G$39:$G$45</definedName>
        <definedName localSheetId="2" name="DISCOUNT">ORD!$G$39:$G$45</definedName>
        <definedName localSheetId="3" name="DISCOUNT">SYD!$G$39:$G$45</definedName>
        <definedName name="DISCOUNT">HKG!$G$39:$G$45</definedName>
    </definedNames>
    <calcPr calcId="124519" fullCalcOnLoad="1"/>
</workbook>

Please let me know if there is any other information you need!

Comments (9)

  1. CharlieC

    Excel could handle this a bit more gracefully. I think fixing this could be pretty tricky because DefinedNames are basically just buckets for definitions and checking for removed or changed sheetnames could involve a lot of parsing.

    At the moment I think this is really outside the scope of the library but would be open to a PR (based on 2.5) that solves it.

  2. CharlieC

    Can you tell me which sheets were deleted? If they relate to the localSheetId then we can probably add a check for this, but if they're in the body of the constant then that could be a lot trickier as this is pretty arbitrary and could, for example, contain a deleted sheet name within a function.

    It would be great if you could supply a test workbook with some Python code removing the worksheets.

  3. Ryan Hutchison reporter

    Hey Charlie,

    I have attached a broken workbook. Please note that I did not fully populate the workbook to protect customer data, though I've confirmed the issue persists within this workbook. Below is the code used to delete the sheets:

        def remove_unused_sheets(self):
            """ remove unused sheets from final report """
            for sheet in self.wb.worksheets:
                if sheet['B6'].value is None:  # sheet is unpopulated
                    del self.wb[sheet.title]
    

    Currently I am hiding unused sheets as a work around.

    Thanks! Let me know if there is anything else you need!

  4. CharlieC

    The file looks like the one produced by openpyxl. Can you supply the original with the additional sheets? (I don't really care about their contents).

  5. CharlieC

    Looking at this in detail and the problem seems restricted to the sheet called SYD. All of the other sheets can be removed without Excel having a fit. This is odd because 'DFW', 'IAD', 'ORD', 'SYD' all have a constant called 'DISCOUNT' defined in local scope. Removing 'DFW', 'IAD' or 'ORD' will lead to a prompt to update the file. Deleting 'HKG', which is the target of a global definition has no side effects. If the sheets are reordered then 'SYD' can be removed which indicates that this is sometime kind of runtime problem with Excel.

    I can't see a quick and easy fix for this which doesn't appear to be a specification. A workaround would loop over the worksheets to get the index and remove the defined names. A bit clumsy but should work:

    for idx, title in enumerate(wb.sheetnames):
        ws = wb[title]
        if ws['B6'].value is None:
            wb.defined_names.delete('DISCOUNT', idx)
            del wb[title]
    
  6. Log in to comment