If a worksheet has a comment, saving the workbook twice corrupts the saved spreadsheet

Issue #1330 resolved
Bryce Poole created an issue

I am working with a spreadsheet that has comments. I save my workbook multiple times during the same run. The first save is fine, but the second save creates a corrupted worksheet that when opened in Excel, complains that it needs to be recovered. It says there is a problem with the comments.

When I renamed the spreadsheet to a zip file and unzipped, I found the comment file in question. It had duplicated comments. I tracked down the code and found that worksheets have an internal list/variable called “_comments.” When a workbook is saved, the worksheet makes a list of all the comments and stores it in “_comments.” This list survives a save call. When save is called a second time, it finds all the comments and saves them to the list a second time but without first clearing out the list, in effect duplicating the comment list.

As a temporary workaround, I found that if I cleared this list between saves, the issue goes away. Here is the sample code:

import openpyxl
from openpyxl.comments import Comment

def main():
    create_corrupt_spreadsheet()

def create_corrupt_spreadsheet():
    wb = openpyxl.Workbook()
    ws = wb.active
    cell = ws.cell(row=1, column=1)
    cell.comment = Comment('comment', 'author')

    wb.save('1-No_Corruption.xlsx')
    wb.save('2-Corrupt.xlsx')

    # Workaround - Clear the list manually
    for sheets in wb.worksheets:
        sheets._comments.clear()

    wb.save('3-No_Corruption.xlsx')

    if __name__ == '__main__':
        main()

I imagine if you were to clear out the “_comments” list in openpyxl/worksheet/_writer.py in the class WorksheetWriter, in the init function, it should fix the problem.

Comments (4)

  1. Log in to comment