IndexError while load_workbook() a just-save()ed xlsx

Issue #762 resolved
Xiaoguang Zhu
created an issue

using Python_3.5.2 with openpyxl_2.4.2 (without PIL/Pillow, only deal with texts) on Ubuntu_16.04

I get an IndexError while tempting to open a just-created xlsx which WAS 6.7KB. I open it (with WPS Spreadsheets), ctrl-s and exit (yes, I did nothing but ctrl-s), the file is now 8.9KB, and now I can load_workbook() it without getting an error.

(cells contain Chinese but I don't think that's where the problem lies...)

lines I use to generate the xlsx:

def newFile(title="测试测试", depart="其他", *, date=str(datetime.now())):
    ''' derive a new .xlsx from ./score-sheets/template.xlsx '''
    filename = title + ' - ' + date + '.xlsx'
    dst = os.path.join(FOLDER, filename)
    try:
        wb = load_workbook(os.path.join(FOLDER, 'template.xlsx'))
    except IOError:
        return 0
    else:
        ws = wb.get_sheet_by_name(wb.get_sheet_names()[0])
        ws.title = title
        ws['B1'].value, ws['F1'].value, ws['J1'].value = title, depart, date
        with sqlite3.connect(DATABASE) as database:
            cursor = database.execute("select name from test where depart = '%s'" % depart)
            names = cursor.fetchall()
        for i in range(len(names)):
            ws['A'+str(i+3)].value = names[i][0]
        wb.save(dst)
        return 1

debug page created by Flask indicates something related to the stylesheet which really confuses me (might be helpful): Screenshot from 2017-02-03 16-24-46.png

contents in the xlsx file: Screenshot from 2017-02-03 16-28-36.png

sending you the 6.7KB xlsx...

Comments (13)

  1. CharlieC

    Thanks for the report. Can you provide some information about the file? Was this created in openpyxl and subsequently can't be opened by it? If so, can you please provide the template file?

  2. Xiaoguang Zhu reporter

    The file (should be 6.7KB in size) is at the last line of the issue description.

    It is created by openpyxl (as shown in the description, with the newFile() function) I can't load_workbook() it right after it is created, but after using some Excel software to open and save it, everything is back on track

  3. CharlieC

    The template file probably is the source of the problem even if it doesn't show up directly. Do you know anything about the application that created it?

    BTW. the best tool for comparing xlsx files is the OOXML SDK Productivity Tool but you need Windows to run it.

  4. Xiaoguang Zhu reporter

    The application that created the template.xlsx? I only know it's called the "WPS Spreadsheet" (check www.wps.com), didn't want to switch to LibreOffice and chose this one because at least it is well-known in China... (so, xlsx could be exactly the same in literal contents, without having the same binary contents? That's something new for me...)

  5. CharlieC

    Thanks for the information. Yes, the files can be essentially the same but be different binaries. It's sufficient just if the the modification timestamp changes. There are other minor differences but nothing substantial. I've identified where the problem is but not quite what's causing it.

  6. CharlieC

    Okay. I've found what's happening here: the template file has a different definition for the "normal" style including the font. This led to openpyxl having 22 font definitions after parsing the named styles (hence the 21). Reordering the code resolves this, which is good because I'm having trouble writing a unit test.

  7. Log in to comment