open book->delete sheet->save book results in unreadable workbook

Issue #446 resolved
ccurvey
created an issue

If I use the following program with the attached spreadsheet, the resulting "bar.xlsx" spreadsheet that is generated is unreadable by either LibreOffice Calc or MS Excel. Given the error messages from MS Excel, I think it has something do to with a named range disappearing.

import openpyxl

book = openpyxl.load_workbook("foo.xlsx")
del book.worksheets[0]
book.active = 0
book.save("bar.xlsx")

Comments (7)

  1. Charlie Clark

    del actually takes a sheet name as a parameter but would raise a KeyError if it couldn't work. The generated file is valid according productivity tool. I can open the file fine in OpenOffice, in Excel with an error message but LibreOffice crashes (I tend to find LibreOffice is less stable than OpenOffice). Excel produces an error message saying that it has removed part of workbook data but the file seems generally unaffected. However, the Excel error messages are usually not very helpful and the file is fine according to the official MS tool.

    This will take some investigating.

  2. Charlie Clark

    I think the issue is related to the use of localSheetId in the named ranges. In the original file, which roundtrips fine with openpyxl, the names are bound to "Sheet 1" as localSheetId="1" (0-indexed) in the same file the same worksheet is sheetId="2"(1-indexed). Consistency, we've heard of it but where's the fun in that?

    When the first worksheet is removed the index for the named ranges should be updated, but it isn't. Manually correcting the references to localSheetId="0" in the file and creating a new archive at least allows LibreOffice to open the file without a problem but Excel still isn't happy.

    I suspect this is going to take a bit of work to fix as it will involve binding the named ranges to the relevant sheet so that these are serialised with the correct index. This currently only happens on creation, I think. The relevant package definitely needs refactoring so I don't see this happening before 2.3 and only then it will probably need a PR.

  3. Charlie Clark

    @Hamza Khchine of course, as wb.__delitem__ calls wb.remove_sheet.

    The problem here is that the names need binding to the worksheet so that the index can be assigned when writing. This basically means refactoring all the defined name code using the new style classes.

  4. Charlie Clark

    This is a simple test case to illustrate the problem.

    from openpyxl import Workbook
    wb = Workbook()
    wb.create_sheet()
    wb.create_named_range("Simple_Range", wb["Sheet1"], "A1", '1')
    wb.save("range.xlsx")
    del wb["Sheet"]
    wb.save("range-broken.xlsx")
    

    Basically every part of the API needs changing: the only required parameters are name and the value of the range (which can be a constant not just a reference); scope is implicit and should refer to names defined for particular worksheets only, ie. we need global named ranges and ones bound to particular worksheets. The scope gets used when reading the names to assign named ranges to particular worksheets and when saving to do the reverse. In fact the best thing would be to have such defined names as child elements of various worksheets but the specification is what it is. Note externalLinks have their own slightly modified form of definedNames.

    The API should probably be change to deprecate all existing methods because the signatures cannot be sensibly maintained and replaced by workbook and worksheet methods that make explicit use of the concept defined name, which more accurately reflects the objects.

  5. Charlie Clark

    Unfortunately, you're wrong. Check the specification and you'll see it has to be an integer. It is the index of the worksheet (in the workbook's sheets node). The important thing is that it is of no relevance to client code as is currently the case.

  6. Log in to comment