OSError when opening workbook with embedded images

Luke Shiner created an issue

When I open a .xlsx workbook containing an image using openpyxl.load_workbook an exception is raised with the message "OSError: cannot identify image file <_io.BytesIO object at 0x7fbebe6dd258>".

This issue seems to have appeared in openpyxl==2.5.10. I can open the file with no issues in 2.5.9 but not 2.5.10 or above (Up to the current 2.5.12 release).

All cases tested with Pillow==5.3.0.

  1. CharlieC

    2.5.10 introduced a more robust way for finding images. I suspect that previously at least one of the images was being lost.

    The problem appears to be with the image used on the first sheet which is a Windows Metafile. Even on the file system PIL cannot read this file. The code already knows to reject WMF files but probably needs to be a bit more robust when it comes to image files it can't read.

  2. Luke Shiner reporter

    I had not noticed that the image is indeed missing after the workbook has been processed by openpyxl 2.5.9.

    With the image removed I can open the workbook in 2.5.12 with no issue so my problem is resolved.

    I wonder, however, if the previous behaviour of silently removing incompatible image is a better approach? I can simply edit my source file to resole the issue but I can imagine a lot of situations in which that would not be the case. Also it gave me confidence when I first used openpyxl and found I did not get any errors even when opening files using advanced features of the xlsx format, I think this error might be off-putting for new users.

    Just my thoughts on the issue. I am otherwise very happy with openpyxl.

  3. CharlieC

    I've just added some code that will mean the image is removed but a warning provided. Support for EMF/WMF is limited in PIL but they are common formats on Windows even though the one in your file looks a lot like EPS.

