Cannot read all worksheet. One is missing.

Issue #304 resolved
spy__
created an issue

I'm trying to read list of worksheets from file (see in attachment) by this code:

book = load_workbook("input.xlsx", use_iterators = True)
print(book.get_sheet_names())

Output is: ['Sheet2', 'Sheet3']

Sheet1 is missing. If I open file, I'll see three sheets. But if I change (whatever how) data at Sheet1 and save file, example code will work correct and output will be ['Sheet1', 'Sheet2', 'Sheet3']

branch: 1.8 tag: 1.8.5

Comments (20)

  1. CharlieC

    Thanks for the report and the file. What tool did you use to create this file? Although Excel has no problem with the file, we do because it is structured slightly differently. For example, sheet1 is actually stored as sheet.xml. I thought I'd fixed most of the problems around this but obviously not.

  2. CharlieC

    They have some software based on the .NET stuff? Maybe they can update to 2.5?

    I think someone else has submitted a similar bug a while back, so it's not unique. I'll try and have a look at it soon.

  3. spy__ reporter

    Yes, they have been using SharePoint. I'm afraid of they cannot update theirs software. But I'll ask them.

    Have this issue already fixed?

  4. CharlieC

    @spy__ No, the issue has not been fixed and I have no timescale for it. I've already tried your file with 1.9 which does contain improvements for reading worksheets and unfortunately it can't read any! :-(

    There are still a lot of open bugs and not enough people working on them.

  5. spy__ reporter

    I've tried to open input.xlsx from attach as zip-archive. The first page had name "sheet.xml". After resaving file by Excel the first page had had name "sheet.xml" I've checked other excel-files created by Excel. Pages have names "sheetX.xml" where X is sequence number.

    But in this case OpenpyXL could only find 'Sheet1', not data. I've been looking for a difference between excel-files formats before using Excel and after.

  6. CharlieC

    @spy__ Yes, I know what the archive looks like and that's one of the reasons why you're having trouble with the file. 1.9 has better handling of files but when I tried your file with it I couldn't open any worksheets. Unfortunately, I am not sure when I can find the time to investigate this further. It may not be for a couple of weeks.

  7. spy__ reporter

    Okay, I'll wait. For now I got hack: I'm using VBS for resaving excel-file.

    Set oBook = oExcel.Workbooks.Open(Wscript.Arguments.Item(0))
    oBook.SaveAs WScript.Arguments.Item(1)
    

    Next my python-script with openpyxl works fine.

  8. CharlieC

    Ah, good you've got a workaround. It would be great to know if the 2.5 SDK is different in this respect. It probably isn't but it would be nice to know.

  9. spy__ reporter

    I'm afraid of my co-workers aren't able to update SDK. :( Got an e-mail today about it. SharePoint 2010 is using .NET 3.5. Open XML Format SDK 2.5 is using .NET 4. Got some problems with web stuff compilations. Anyway here's complete example of workaround script:

    Dim oExcel
    Set oExcel = CreateObject("Excel.Application")
    Dim oBook
    Set oBook = oExcel.Workbooks.Open(Wscript.Arguments.Item(0))
    oBook.SaveAs WScript.Arguments.Item(1)
    oBook.Close False
    oExcel.Quit
    

    Usage:

    script.vbs source.xlsx dest.xlsx
    

    I hope it helps to someone. :)

  10. CharlieC

    What does the script do? Get the Office extensions to "fix" the Workbook by loading and saving it? If so, this does sound suspiciously like a bug in the library that creates it.

  11. spy__ reporter

    Yep, this script just open and save file via Excel using OLE component. Hm, you're right, it might be totally bug in Open XML Format SDK 2.0.

  12. CharlieC

    @spy__ 2.0 but the best thing is actually to checkout the repository and switch (hg up 2.0) to using the 2.0 branch.

    BTW. I'm not sure if the sheet names will match your expectations as AFAIK the references from the Workbook to the sheets and the global references are different (Sheets 1 and 3 are reversed). If this is the case then your colleagues have generated files incorrectly. But that's understandable considering how ham-fisted the specification is.

  13. Log in to comment