I am working with a spreadsheet that has comments. I save my workbook multiple times during the same run. The first save is fine, but the second save creates a corrupted worksheet that when opened in Excel, complains that it needs to be recovered. It says there is a problem with the comments.
When I renamed the spreadsheet to a zip file and unzipped, I found the comment file in question. It had duplicated comments. I tracked down the code and found that worksheets have an internal list/variable called “_comments.” When a workbook is saved, the worksheet makes a list of all the comments and stores it in “_comments.” This list survives a save call. When save is called a second time, it finds all the comments and saves them to the list a second time but without first clearing out the list, in effect duplicating the comment list.
As a temporary workaround, I found that if I cleared this list between saves, the issue goes away. Here is the sample code:
import openpyxl from openpyxl.comments import Comment def main(): create_corrupt_spreadsheet() def create_corrupt_spreadsheet(): wb = openpyxl.Workbook() ws = wb.active cell = ws.cell(row=1, column=1) cell.comment = Comment('comment', 'author') wb.save('1-No_Corruption.xlsx') wb.save('2-Corrupt.xlsx') # Workaround - Clear the list manually for sheets in wb.worksheets: sheets._comments.clear() wb.save('3-No_Corruption.xlsx') if __name__ == '__main__': main()
I imagine if you were to clear out the “_comments” list in openpyxl/worksheet/_writer.py in the class WorksheetWriter, in the init function, it should fix the problem.