Invalid Excel *.xlsm file when editing a workbook containing an external sheet with comments & formatting

Issue #554 resolved
Philipp Freyer
created an issue

As already discussed in openpyxl-users:

Scenario: I need to update information in a lot of Excel worksheets. They are created with Excel 2007. Editing the data works fine but I have problems if someone using these sheets inserts another worksheet into them by copying them from another workbook with Excel by right-clicking on their tab in Excel and selecting the "move or copy" option for copying the worksheet.

I am using the latest openpyxl library from PyPI (2.3.0) and Python 2.7.8 32 bit on Windows 7 (64 bit).

Here a short example script that already shows the problem:

# coding=utf-8
import openpyxl

update_sheet = u"TESTBOOK_original.xlsm"
wb = openpyxl.load_workbook(update_sheet, keep_vba=True)
wb._external_links = []
wb.worksheets[0]["A1"].comment = Comment("ab", "cd")
wb.save(update_sheet)

The script just opens and saves the file, removing the external links (reason: see previous thread) and including one comment. When I open the resulting file afterwards, I get the following error: Excel found unreadable content in 'TESTBOOK_broken.xlsm'. Do you want to recover the contents of this workbook? If you trust the source of this workbook, click Yes. When I click "Yes", Excel repairs the file, giving me the following error log:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"><logFileName>error127960_01.xml</logFileName><summary>Errors were detected in file 'C:\Users\pfreyer\Documents\TESTBOOK_bad.xlsm'</summary><removedFeatures summary="Following is a list of removed features:"><removedFeature>Removed Feature: Data validation from /xl/worksheets/sheet5.xml part</removedFeature><removedFeature>Removed Feature: Data validation from /xl/worksheets/sheet6.xml part</removedFeature></removedFeatures><removedRecords summary="Following is a list of removed records:"><removedRecord>Removed Records: Comments from /xl/comments1.xml part (Comments)</removedRecord></removedRecords></recoveryLog>

I will happily provide all the information I can, if you need more information. If you classify this as a bug, I will also create an issue in Bitbucket. What I did not notice before: Not only the comments, but also the data validation that I set are a problem for Excel. I saw in the documentation that it is not possible to read existing data validation, but should openpyxl at least be able to keep it? If not, is there a way to remove existing validation, so that I can just replace it?

Kind regards and thanks so far, Philipp

Comments (15)

  1. CharlieC

    Okay. The problem now seems to be entirely related to the new comment. Can you confirm that the file now roundtrips as long as you don't set a comment?

    BTW wb[0] won't work in the example.

  2. Philipp Freyer reporter

    I can confirm that. If I uncomment the line setting the comment, the files will be opened and saved by openpyxl and opens without errors in Excel 2007 afterwards.

  3. CharlieC

    Okay. It looks like the problem is when VBA and comments are on the same sheet. We presumably need to merge the relevant VML files and this isn't going to happen because this is an undocumented format outside the scope of the specification.

    If you add a comment to any of the other worksheets then the file is usable.

    The following snippet (using your original file) can be used to illustrate this. As soon as comments get added to "Sheet1", "Mapping", or "Configuration" it all goes to hell.

    import os.path
    
    from openpyxl import __version__
    from openpyxl import load_workbook
    from openpyxl.comments import Comment
    
    fname = "Issues/bug554.xlsm"
    stem, ext = os.path.splitext(fname)
    
    wb = load_workbook(fname, keep_vba=True)
    
    fname = "{0}-{1}-nc.xlsm".format(stem, __version__)
    wb.save(fname)
    for title in ['Sheet4', 'Sheet3', 'Sheet2',]:
        ws = wb[title]
        cell = ws['A8']
        cell.value = "comment"
        cell.comment = Comment("my comment", "charlie")
    fname = "{0}-{1}-wc.xlsm".format(stem, __version__)
    wb.save(fname)
    

    @John Bovey you might want to look at this but I think you'll agree that it's not doable.

  4. John Bovey

    We are definitely doing something badly wrong here. None of the sheets contain any controls so they do not need to have a link for vbaControlId. Presumably the vmlDrawing?.vml files are there to format the original comments. These are then conflicting with the added commentsDrawing?.vml files for new comments. I'm happy to have a look at the code but what would be the best branch to work with?

  5. John Bovey

    I have had a better look at what is going on (using branch 2.3) and it should be perfectly doable - at least for the situation we have here where the sheet has existing comments, more comments are added, there are no controls on the sheet, and keep_vba is True. I'll try and do a pull-request in the next couple of days, @CharlieC could you let me know which branch to work in, though.

  6. Log in to comment