Problem With Content

Issue #1309 resolved
Dan Kanefsky created an issue

I have an excel file exported from File Maker Pro and even if I load the workbook and save without reading or writing anything:

wb = openpyxl.load_workbook('Original.xlsx')
wb.save('Original.xlsx')

it gives me this error when I open it up after running it through openpyxl.

If I click Yes, it give me an option to view the error xml file:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<logFileName>Repair Result to new1.xml</logFileName>
<summary>Errors were detected in file '/Volumes/post/new.xlsx'</summary>
<removedParts summary="Following is a list of removed parts:">
<removedPart>Removed Part: /xl/styles.xml part with XML error.  (Styles) HRESULT 0x8000ffff</removedPart>
</removedParts><repairedRecords summary="Following is a list of repairs:">
<repairedRecord>Repaired Records: Cell information from /xl/worksheets/sheet1.xml part</repairedRecord>
</repairedRecords>
</recoveryLog>

Once I repair the file, everything is fine except there is no styling throughout the workbook (bold headers, fonts, etc).

HOWEVER, if I simply open and then save the excel file before running it through openpyxl, it does not throw that error and everything is fine.

I would really like to avoid this alert without having to manually open up each excel file.

It might have to do with the excel file out of File Maker missing a line in the XML and a “themes” file. I’ve compared all three XML’s by opening up with excel file with textwrangler.

For reference:

“Original” – the XML of the original excel file straight out of File Maker

“Saved” – is the XML of the original file simply opened up in excel and saved, no changes were made.

“Original RAN” and “Saved RAN” – is the xml after I loaded and saved with openpyxl

“Original RAN Recovered” – is the xml after the Original was run through openpyxl then opened up in excel, repaired and then saved.

<!--Original-->

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
<Default ContentType="application/vnd.openxmlformats-package.relationships+xml" Extension="rels"/>
<Default ContentType="application/xml" Extension="xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml" PartName="/xl/workbook.xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml" PartName="/docProps/app.xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml" PartName="/xl/worksheets/sheet1.xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml" PartName="/xl/sharedStrings.xml"/>
<Override ContentType="application/vnd.openxmlformats-package.core-properties+xml" PartName="/docProps/core.xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml" PartName="/xl/styles.xml"/></Types>

<!-- Saved -->

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
<Default Extension="xml" ContentType="application/xml"/>
<Override PartName="/xl/workbook.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"/>
<Override PartName="/xl/worksheets/sheet1.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/>
<Override PartName="/xl/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/> 
<Override PartName="/xl/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml"/>
<Override PartName="/xl/sharedStrings.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml"/>
<Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/>
<Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/></Types>

<!-- Original RAN -->

<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
<Default ContentType="application/vnd.openxmlformats-package.relationships+xml" Extension="rels" />
<Default ContentType="application/xml" Extension="xml" />
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml" PartName="/xl/styles.xml" />
<Override ContentType="application/vnd.openxmlformats-officedocument.theme+xml" PartName="/xl/theme/theme1.xml" />
<Override ContentType="application/vnd.openxmlformats-package.core-properties+xml" PartName="/docProps/core.xml" />
<Override ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml" PartName="/docProps/app.xml" />
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml" PartName="/xl/worksheets/sheet1.xml" />
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml" PartName="/xl/workbook.xml" />
</Types>


<!-- Saved RAN -->

<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
<Default ContentType="application/vnd.openxmlformats-package.relationships+xml" Extension="rels" />
<Default ContentType="application/xml" Extension="xml" />
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml" PartName="/xl/styles.xml" />
<Override ContentType="application/vnd.openxmlformats-officedocument.theme+xml" PartName="/xl/theme/theme1.xml" />
<Override ContentType="application/vnd.openxmlformats-package.core-properties+xml" PartName="/docProps/core.xml" />
<Override ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml" PartName="/docProps/app.xml" />
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml" PartName="/xl/worksheets/sheet1.xml" />
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml" PartName="/xl/workbook.xml" />
</Types>

<!-- Original Ran Revoered (SAVED) -->

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
<Default Extension="xml" ContentType="application/xml"/><Override PartName="/xl/workbook.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"/>
<Override PartName="/xl/worksheets/sheet1.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/>
<Override PartName="/xl/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/>
<Override PartName="/xl/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml"/>
<Override PartName="/xl/sharedStrings.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml"/>
<Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/>
<Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/></Types>

Line 22 is the one discrepancy between the Original excel XML and the Saved XML. However, I’m not sure what this actually means.

Unfortunately I cannot provide the original.xlsx file because it has private information and as soon as I open up and save the file, there are no issues.

Appreciate any insight! I hope I provided enough info.

Comments (14)

  1. CharlieC

    Thanks for the detailed report, which is unfortunately a bit hard to read because of the escaping which isn’t necessary for BitBucket.

    openpyxl will automatically add a default theme to the package, so this is unlikely to be the problem.

    Excel error messages are, however, often not very helpful but the report suggests that the problem is related to the worksheet.

    Would it be possible to provide a sample file? If not you should diff the sheet1.xml of the packages or simply provide them both: the original won’t contain anything confidential and you can search and replace any strings in the openpyxl version.

  2. Dan Kanefsky reporter

    Hey Charlie, thanks for looking into this.

    Sure thing. Here are the two sheet1.xml's. I renamed them to be Original.xml and new.xml.

    If you have a quick way for me to fix that escaping issue I can fix it real quick.

  3. CharlieC

    Thanks for the files. This is an odd one but it looks like it's related to style information: if you remove all style pointers s="..." from the worksheet and insert in an existing workbook then there are no errors. This suggests that there is a problem with the styles in the file processed by openpyxl. Can you provide the styles.xml from the two workbooks?

  4. Dan Kanefsky reporter

    Both have been uploaded.

    I should also note that the Saved.xlsx has a "theme" folder with a file called theme1.xml in it. Original does not have a theme folder. So looks like excel creates this itself after it's saved inside the program.

  5. CharlieC

    Thanks for the files. As noted above, this has nothing to do with the theme, which is an entirely optional thing. The problem is with the styles.xml from the original file. The individual xf elements have an xfId atttribute that points to the xf element in the cellXfs, which is missing in the original. When this element and the cellStyles element are missing then Excel doesn't seem to mind but the file is invalid. Like I said, the Excel error messages are often not that helpful.

    You can check this in the OOXML Productivity tool by inserting the Original_style.xml into the package. I'll see if I can get openpyxl to always insert a default or otherwise not add the cellXfs or cellStyles elements if these are empty.

  6. CharlieC

    So, while I investigate a more robust solution, here’s a workaround that should help.

    First, please check that wb._named_styles == []

    The simply set the style for any cell to the default.

    c = ws(row=ws.max_row+1, column=1)
    c.style = "Normal"

    This should reset the default styles for the workbook.

    But, basically you should inform the maker of whatever application is producing the file that they should include a default style.

  7. Dan Kanefsky reporter

    That worked!

    I checked if wb._named_styles == [] and it came back as true.

    I should note that I had to add a “.cell” after ws so I used:

    c = ws.cell(row=ws.max_row+1, column=1)
    c.style = "Normal"

    Thank you and I really appreciate your help with all this!

  8. Log in to comment