xlsx files produced in some environments are incompatible/invalid

Issue #252 resolved
netnichols
created an issue

I recently came across an issue where openpyxl on my dev machine was producing perfectly valid xlsx files, but the same code was producing incompatible xlsx files on my staging server.

I traced this down to my staging server having lxml installed while my dev machine did not. After installing lxml on my dev machine it started producing the same (seemingly) invalid xlsx files.

These xlsx files are completely rejected by Numbers. Libreoffice does open them. And according to a colleague Excel does open them but (sometimes) displays a warning. I don't think the testing code listed below produces warnings in Excel, but if necessary I could probably (in time) give more thorough testing code which does.

I diffed the xlsx packages produced with and without lxml, and from a cursory exploration it seems that the only difference is the way namespaces are handled, but I could easily be wrong about that.

Dev Env:

OS X 10.9.1, python 2.7.5, openpyxl 1.8.0, lxml 3.2.5

Staging Env:

Ubuntu 12.04.2, python 2.7.3, openpyxl 1.8.0, lxml 2.3.2

Testing Code:

https://gist.github.com/netnichols/da3f60b1fb3496b44088

Comments (6)

  1. Charlie Clark

    Looks like a horribly outdated version of lxml on your staging server and you really ought to update it. Any reason you're not using a virtualenv on your staging server?

    You can prevent openpyxl from using lxml by setting LXML = FALSE in `openpyxl/init.py``

    But, in your case, it's probably best either to remove or update lxml. I guess it might be possible to add a check for lxml >= 3.x to the code, but really you should be managing dev and staging environments so that they are as close as possible.

  2. netnichols reporter

    Looks like a horribly outdated version of lxml on your staging server and you really ought to update it.

    I tried to be clear that the issue is also reproducible on my dev machine with the latest version of lxml installed. There are reasons for the super old lxml on staging, but since the issue also exists with lxml 3.2.5 I don't think this is relevant.

    You can prevent openpyxl from using lxml by setting LXML = FALSEj in `openpyxl/init.py``

    I tried doing this dynamically and it didn't work:

    import openpyxl
    openpyxl.LXML = False
    

    I assume because openpyxl imports all (or most) of its submodules automatically, which then read LXML before I can change it.

    Are you suggesting that I update the source code of openpyxl? That's not a reasonable solution for me when it's much easier to just use xlsxwriter instead. It would be great to only use one library for xlsx processing, but using two is better than using an unsupported fork of one.

  3. Charlie Clark

    I have the same dev environment so I know the generated files are okay. The file test-with-lxml.xlsx is fine (Excel 2010 on Mac OS 10.9.1)

    On your staging server if you use a virtualenv without system packages then lxml will not be available. Otherwise, assuming you can install from source then you can set the environment variable OPENPYXL_LXML to False to prevent it being used.

  4. Log in to comment