Can't save and reopen xlsx file with comments with version 2.3.0

Issue #536 resolved
SleepProgger
created an issue

I tried to load, change and save an xlsx file i downloaded from google spreadsheet and got this error on reloading the file:

Traceback (most recent call last):
  File "D:\workspace\python\SantasImp\test.py", line 15, in <module>
    book = load_workbook(filename, keep_vba=True, data_only=True, guess_types=True)
  File "C:\Program Files (x86)\python\lib\site-packages\openpyxl\reader\excel.py", line 241, in load_workbook
    read_comments(new_ws, archive.read(comments_file))
  File "C:\Program Files (x86)\python\lib\site-packages\openpyxl\reader\comments.py", line 31, in read_comments
    runtext = ''.join([t.text for t in run.findall('{%s}t' % SHEET_MAIN_NS)])
TypeError: sequence item 0: expected string, NoneType found

After some digging around i realized this happens when using comments. Removing the comments before saving solves this issue.

Test case:

    filename = 'test.xlsx'
    book = load_workbook(filename, keep_vba=True)
    # clean comments
    sheet = book.active
    for row in sheet.rows:
        for cell in row: cell.comment = None
    book.save("test2.xlsx")
    filename = 'test2.xlsx'
    book = load_workbook(filename, keep_vba=True)

I added my test xlsx file generated by google spreadsheets.

Thanks for this great piece of software, SleepProgger

Comments (7)

  1. SleepProgger reporter

    Just a little update: When trying to load the corrupted (?) file there is some fd leakage. A dirty little fix for this problem:

        fd = open(tmp_name, 'rb')
        try:
            book = load_workbook(filename=fd, keep_vba=True)
        except TypeError as e:
            log_error('Failed reloading file. Try --cleancomments')
        fd.close()
    
  2. John Bovey

    I cannot reproduce this problem when running openpyxl on linux - the test.xlsx workbook loads without errors. It does not look to me as though this is vba related, though. Does the error go away when you leave out the keep_vba=True? Incidentally, why are you using keep_vba=True? If you are loading an xlsx file then it should not contain any vba.

  3. CharlieC

    @John Bovey no, it's nothing to do with VBA. The error is related to the way the comment text is stored (openpyxl is pretty inflexible). I've nearly fixed that. But I thought you might also want to see how Google Docs deals with the specification: different name for the VML and an apparently empty WorksheetDrawing file, which I think is how strict part of the specification does it. Some other bits where it seems like Google is playing fast and loose with the specification but I think we all do to a certain extent.

  4. Log in to comment