Loading file, editing in openpyxl and resaving results in loss of internal files

Issue #721 closed
Matthew Lemon
created an issue

Hi,

Hopefully this can be answered fairly swiftly, but I'm working on an application that opens an existing xlsx file (it's a 'blank' form, in Excel), writes some new data into cells, the saves it out to another xlsx file.

The program is running on a Linux box and the resulting file opens in Libreoffice Calc, but the styling/coloring is wrong and there are graphics are missing.

Excel on a Windows box refuses to open the same file at all.

An ls of both the original template file unzipped and the file written-to by openpyxl unzipped below - the original one first:

Template file (from Excel) (unzipped)

drwxr-xr-x 10 lemon lemon   4096 Nov 15 14:20 . 
drwxr-xr-x  5 lemon lemon   4096 Nov 15 14:20 .. 
-rw-r--r--  1 lemon lemon   3093 Jan  1  1980 calcChain.xml 
drwxr-xr-x  2 lemon lemon   4096 Nov 15 14:20 ctrlProps 
drwxr-xr-x  3 lemon lemon   4096 Nov 15 14:20 drawings 
drwxr-xr-x  3 lemon lemon   4096 Nov 15 14:20 externalLinks 
drwxr-xr-x  2 lemon lemon   4096 Nov 15 14:20 media 
drwxr-xr-x  2 lemon lemon   4096 Nov 15 14:20 printerSettings 
drwxr-xr-x  2 lemon lemon   4096 Nov 15 14:20 _rels 
-rw-r--r--  1 lemon lemon  25758 Jan  1  1980 sharedStrings.xml 
-rw-r--r--  1 lemon lemon 147037 Jan  1  1980 styles.xml 
drwxr-xr-x  2 lemon lemon   4096 Nov 15 14:20 theme 
-rw-r--r--  1 lemon lemon   2213 Jan  1  1980 workbook.xml 
drwxr-xr-x  3 lemon lemon   4096 Nov 15 14:20 worksheets 

File written by openpyxl after editing in openpyxl (unzipped)

drwxr-xr-x 4 lemon lemon   4096 Nov 15 14:20 .
drwxr-xr-x 5 lemon lemon   4096 Nov 15 14:20 ..
drwxr-xr-x 2 lemon lemon   4096 Nov 15 14:20 _rels
-rw-r--r-- 1 lemon lemon  39121 Nov 15 13:59 sharedStrings.xml
-rw-r--r-- 1 lemon lemon 155699 Nov 15 13:59 styles.xml
-rw-r--r-- 1 lemon lemon   2352 Nov 15 13:59 workbook.xml
drwxr-xr-x 3 lemon lemon   4096 Nov 15 14:20 worksheets

There is a lot of data missing from the resulting file, which is why Excel is choking and I'm seeing styling/colour problems.

Unfortunately, I'm unable to share the original files for sensitivity reasons, but I was wondering whether there is anything obvious I am doing when saving/writing this file.

Thanks for checking this out.

Matt

Comments (9)

  1. CharlieC

    Try using a checkout of the 2.4 branch. Recently fixed some stuff on the styles. Note that currently LO can produce invalid files which openpyxl will refuse to process. This is a bug in LO. The files missing in the archive are not relevant to styles.

    We currently do not preserve any images or charts.

    If you still have problems then I will need to a file to work with.

  2. Matthew Lemon reporter

    Thanks Charlie. Have run with openpyxl (2.4.0b1) and no discernible difference.

    tree for original xlsx file:

    .
    ├── [Content_Types].xml
    ├── docProps
    │   ├── app.xml
    │   └── core.xml
    ├── _rels
    └── xl
        ├── calcChain.xml
        ├── ctrlProps
        │   ├── ctrlProp1.xml
        │   ├── ctrlProp2.xml
        │   ├── ctrlProp3.xml
        │   └── ctrlProp4.xml
        ├── drawings
        │   ├── drawing1.xml
        │   ├── drawing2.xml
        │   ├── drawing3.xml
        │   ├── drawing4.xml
        │   ├── drawing5.xml
        │   ├── _rels
        │   │   ├── drawing1.xml.rels
        │   │   ├── drawing2.xml.rels
        │   │   ├── drawing3.xml.rels
        │   │   ├── drawing4.xml.rels
        │   │   └── drawing5.xml.rels
        │   └── vmlDrawing1.vml
        ├── externalLinks
        │   ├── externalLink1.xml
        │   └── _rels
        │       └── externalLink1.xml.rels
        ├── media
        │   ├── image1.png
        │   ├── image2.emf
        │   ├── image3.png
        │   ├── image4.png
        │   └── image5.emf
        ├── printerSettings
        │   ├── printerSettings1.bin
        │   ├── printerSettings2.bin
        │   ├── printerSettings3.bin
        │   ├── printerSettings4.bin
        │   ├── printerSettings5.bin
        │   ├── printerSettings6.bin
        │   ├── printerSettings7.bin
        │   └── printerSettings8.bin
        ├── _rels
        │   └── workbook.xml.rels
        ├── sharedStrings.xml
        ├── styles.xml
        ├── theme
        │   └── theme1.xml
        ├── workbook.xml
        └── worksheets
            ├── _rels
            │   ├── sheet1.xml.rels
            │   ├── sheet2.xml.rels
            │   ├── sheet3.xml.rels
            │   ├── sheet4.xml.rels
            │   ├── sheet5.xml.rels
            │   └── sheet6.xml.rels
            ├── sheet1.xml
            ├── sheet2.xml
            ├── sheet3.xml
            ├── sheet4.xml
            ├── sheet5.xml
            └── sheet6.xml
    
    14 directories, 51 files
    

    tree for file written by openpyxl:

    .
    ├── [Content_Types].xml
    ├── docProps
    │   ├── app.xml
    │   └── core.xml
    ├── _rels
    └── xl
        ├── externalLinks
        │   ├── externalLink1.xml
        │   └── _rels
        │       └── externalLink1.xml.rels
        ├── _rels
        │   └── workbook.xml.rels
        ├── sharedStrings.xml
        ├── styles.xml
        ├── theme
        │   └── theme1.xml
        ├── workbook.xml
        └── worksheets
            ├── _rels
            │   ├── sheet2.xml.rels
            │   └── sheet6.xml.rels
            ├── sheet1.xml
            ├── sheet2.xml
            ├── sheet3.xml
            ├── sheet4.xml
            ├── sheet5.xml
            └── sheet6.xml
    
    9 directories, 18 files
    

    I'd be happy to send you the template file by email if you're willing. I won't be able to send the file written by openpyxl as it contains sensitive data after processing but if you can tell nothing from the template then I'll go through the resulting file and replace all data with dummy figures and email you that. Let me know what you think.

  3. Matthew Lemon reporter

    I thought that was what you meant when you said "Try using a checkout of the 2.4 branch". If this is not correct, please be more explicit. I presume I've used an old branch?

  4. Matthew Lemon reporter

    Output tree using 2.4.0 checkout from repo, if I've used hg correctly...

    .
    ├── [Content_Types].xml
    ├── docProps
    │   ├── app.xml
    │   └── core.xml
    ├── _rels
    └── xl
        ├── externalLinks
        │   ├── externalLink1.xml
        │   └── _rels
        │       └── externalLink1.xml.rels
        ├── _rels
        │   └── workbook.xml.rels
        ├── sharedStrings.xml
        ├── styles.xml
        ├── theme
        │   └── theme1.xml
        ├── workbook.xml
        └── worksheets
            ├── _rels
            │   ├── sheet2.xml.rels
            │   └── sheet6.xml.rels
            ├── sheet1.xml
            ├── sheet2.xml
            ├── sheet3.xml
            ├── sheet4.xml
            ├── sheet5.xml
            └── sheet6.xml
    
    9 directories, 18 files
    

    Still same issues with cell formatting. It's not a problem that images aren't preserved. Let me know if I can email the template. Thanks for your help.

  5. Matthew Lemon reporter

    Ok, thanks. Will give it a go. I don't have admin rights to a Window machine so I don't think I'll be able get MS OOXML Productivity Tool which is a shame because I keep seeing it referenced in discussions for handling these kind of problems. I'm doing all the XML stuff with Linux tools.

  6. Log in to comment