Unexpected ERROR when loading a file - "OSError: File contains no valid workbook part"

Issue #789 resolved
Nitin Chowdhary
created an issue

Hello,

I get the following error - "OSError: File contains no valid workbook part" when the attached Python script is run. The script is simply trying to load the workbook through a load_workbook() call.

The attached excel file i.e. muthoot.xlsx loads normally when I open it using Excel (on mac, version 15.24) but surprisingly generates this issues when run through the above file.

I am using Python 3.6.1 and openpyxl version 2.4.5.

Regards

Nitin

Comments (24)

  1. CharlieC

    According to the the specification this file is invalid because the content-types file contains no workbook part.

    The specification §12.2 clearly says First, the content types for relationship parts, the Workbook part, and at least one Sheet part must be defined (physically located at /[Content_Types].xml in the package)

    Please report the bug to the developers of whichever package created it.

  2. Nitin Chowdhary reporter

    Thanks for the note back, Charlie!

    I will try and follow up with the creator of the Excel file (they're a large company, so ...).

    One question - when I load this excel file in excel, it opens fine. Why would Excel handle it correctly? Also, since I am new to openpyxl, can you point me to the specification referred to above? It didn't show up in Google Search.

    Thanks again!

    Nitin

  3. CharlieC

    I can't comment on why other applications behave differently. What's the point of a specification if you don't follow it? There is a link to the specification in the documentation.

  4. Ben Brian

    Note that versions of openpyxl <= 2.4.4 were able to read these non-compliant files. So that may be an acceptable workaround for some people.

    I don't know what tool generates that format, but I have encountered it numerous times online, besides processing such files in my own projects. It might be worth considering a "compatibility" mode to support them, since they worked in older versions of openpyxl and Excel opens them as well. Just my 2c :)

  5. Anonymous

    Okay Charlie, I don't normally visit this community, but this is a crap way to respond to what turns out to be a real issue. It's the kind of thing I see way too often in the open source community that prevents traction even when people are trying to come forward with good intentions.

    If you examine the source .xslx files you'll notice that the problem here is that the entire content-type loading system including the manifest find function fails to handle default content types properly. This .xml file provided contains:

    <?xml version="1.0" encoding="utf-8"?>
    <Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
    <Default Extension="xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml" />
    <Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml" />
    <Override PartName="/xl/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml" />
    <Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml" />
    <Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml" />
    <Override PartName="/xl/worksheets/sheet1.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml" />
    </Types>

    which is perfectly valid. The workbook is handled by the default definition and all sheets are handled by the override. This still resolves to only 1 workbook part in the system and everything else parses fine. The file isn't the issue here, it's the failure to handle default content types in the XML schema.

    Since this is a common output from Microsoft's own visual studio excel manipulation tools. It's critical that this get fixed. To do so ensure that the manifest.py module finds default objects, and that the unzipped package is appropriately searched for a *.xml file that does not have an override. As a minimal patch, you could just default to /workbook.xml when a default content type for the workbook is detected (since this is the only output I've ever seen from an MS product), but the strict implementation requires a file tree search.

    Please re-open this issue and fix.

  6. Anonymous

    Right, problem being barrier to entry, which is where most open source improvements end up dying. Happy to share, but I don't know the ins and outs of this project and have no familiarity with the bitbucket community. I've done the investigation for you and I have already provided a logical design for the solution; but point taken, hacky fix attached. Again, not knowing the community I don't know your workflow for pulls, pushes, etc. But here's the raw source as modified from my local copy of 2.4.7. Should be easy enough to diff and apply changes after evaluating for safety. Long-term the right move is probably to fully implement defaults.

  7. Derek Schmidt

    +1

    Am experiencing the exact same problem. Not sure what other program is outputting the excel file, but it seems to be widely used by my employer, making openpyxl basically useless for reading any corporate spreadsheet.

    I'll try the earlier mentioned "hacky fix" sometime soon to see if it resolves the issue

  8. Ben Brian

    Charlie, it seems openpyxl already doesn't "support" these Excel files. The least the rest of us can do is find a solution that works for the rest of us, and openpyxl can remain broken if that's what you prefer.

  9. Reuben Mees

    I am also experiencing this error. A script I wrote using openpyxl has worked consistently over several months on my Linux machine. I am now trying to transfer my scripts to a new Windows machine at my office where this program will be utilized and I get this new error. I'm not sure which version of openpyxl I have installed on my Linux machine (I knowit is prior to 2.4.4 whereas this Windows machine has 2.4.7 installed). The .xlsx files I am converting are received from an outside source, generated from a database program, but have not changed. Could this be a problem with the latest release of openpyxl? Has anyone reported this to the developers yet?

  10. jj

    I want to thank everyone for raising this issue and addressing it. I'm new to openpyxl, but have found the library immensely useful in my work. Many of the files I have to work with come from an automated system over which I have no control, and this fix has worked beautifully.

  11. nick wong

    I know this issue has been closed. But I came cross a very similar one and the work around do not work. The XLSX file open correctly in MS but the mime type is {'application/vnd.openxmlformats-package.relationships+xml', 'application/xml'}. I tried to set it manually to XLSX but it do not work. @Charlie Gunyon Gunyon can you help with this?

  12. nick wong

    That's strange. The file I provided is just an reproducible example. My real file is a very complicated xls file. To make it openpyxl processable, unoconv (a libreoffice utility) is used to convert it into xlsx. The converted file dose not lost any data/format and can be viewed with Microsoft office. I am just wondering why it is not valid Excel any more. Does openpyxl really need to depends on the meta data the file contained? Can we give it some default values if that meta information is missing?

  13. CharlieC

    This has nothing to do with the metadata: the OOXML specification mandates a workbook part. The file provided is nowhere near being a valid XLSX file and contains no data. Please don't waste time providing incomplete files.

  14. Log in to comment