version 1.5.8, 1.6.1 - wbPr is None when reading xlsx file

Issue #181 resolved
murphyke
created an issue

This may not be a problem with openpyxl, but since Excel (2008 for Mac) can read the file successfully, I thought I would report it.

Openpyxl blows up (trace below) when reading this file, but if I open and re-save it with MS Excel, openpyxl can, not surprisingly, read the rewritten file with no problems.

The metadata for this file suggests that it was created by a product using the Axolot Data XLSReadWriteII library (v 4.00.38).

Unfortunately I am not permitted to provide the file, and I have no way of creating a file that reproduces the problem. This may be a bug in the Axolot library; the current version of that is 4.00.66.

File "/Users/johnbigboote/Documents/code/python/pcgc-env/lib/python2.7/site-packages/openpyxl/reader/excel.py", line 115, in load_workbook
    _load_workbook(wb, archive, filename, use_iterators)
  File "/Users/johnbigboote/Documents/code/python/pcgc-env/lib/python2.7/site-packages/openpyxl/reader/excel.py", line 140, in _load_workbook
    wb.properties.excel_base_date = read_excel_base_date(xml_source=archive.read(ARC_WORKBOOK))
  File "/Users/johnbigboote/Documents/code/python/pcgc-env/lib/python2.7/site-packages/openpyxl/reader/workbook.py", line 82, in read_excel_base_date
    if ('date1904' in wbPr.keys() and wbPr.attrib['date1904'] in ('1', 'true')):
AttributeError: 'NoneType' object has no attribute 'keys'

Random asides: your issues form needs updating: not all versions appear in the Version dropdown. Also, I can't search by Version in the issues search form. Also, I didn't notice any release notes/change log information anywhere for openpyxl.

Comments (21)

  1. Finn Årup Nielsen

    I get this error too with a file generated from a web system, - not Excel.

    As a bad workaround I went into line 82 of openpyxl/reader/workbook.py and did:

    if wbPr is not None and ('date1904' in wbPr.keys() and wbPr.attrib['date1904'] in ('1', 'true')):

    One date that I checked is correct in Python, compared to how it is displayed in libreoffice, so perhaps CALENDAR_WINDOWS_1900 should be returned in these cases?

  2. Finn Årup Nielsen

    It works apparently ok for me now!

    >>> import openpyxl
    >>> openpyxl.reader.excel.load_workbook("E13_0_Resultater.xlsx")
    <openpyxl.workbook.Workbook object at 0x17d47d0>
    >>> openpyxl.__version__
    '1.7.0'
    
  3. murphyke reporter

    Charlie,

    FWIW, my test file produces an entirely different error with 1.7. I don't have time to look at it right now, unfortunately. If you want me to take a few minutes next week, let me know.

    >>> openpyxl.reader.excel.load_workbook("DataHub_02-05-2013.xlsx")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python2.7/site-packages/openpyxl/reader/excel.py", line 136, in load_workbook
        _load_workbook(wb, archive, filename, use_iterators, keep_vba)
      File "/usr/local/lib/python2.7/site-packages/openpyxl/reader/excel.py", line 158, in _load_workbook
        wb.read_workbook_settings(archive.read(ARC_WORKBOOK))
      File "/usr/local/lib/python2.7/site-packages/openpyxl/workbook.py", line 108, in read_workbook_settings
        if 'activeTab' in view.attrib:
    AttributeError: 'NoneType' object has no attribute 'attrib'
    >>> openpyxl.__version__
    '1.7.0'
    
  4. CharlieC

    Okay, there is a pull request for that error but no tests. But I am very reluctant to accept workaround code without at least relevant tests in the absence of real files.

  5. murphyke reporter

    Using https://bitbucket.org/ericgazoni/openpyxl/get/1.8.tar.bz2, install doesn't work:

    (junk-env)Kevin-Murphys-iMac:ericgazoni-openpyxl-085c2dd09f72 murphyke$ python setup.py install
    Traceback (most recent call last):
      File "setup.py", line 28, in <module>
        import openpyxl  # to fetch __version__ etc
      File "/Volumes/bigdisk/murphyke/Documents/code/python/junk-env/workdir/ericgazoni-openpyxl-085c2dd09f72/openpyxl/__init__.py", line 38, in <module>
        from openpyxl import workbook
      File "/Volumes/bigdisk/murphyke/Documents/code/python/junk-env/workdir/ericgazoni-openpyxl-085c2dd09f72/openpyxl/workbook.py", line 36, in <module>
        from openpyxl.worksheet import Worksheet
      File "/Volumes/bigdisk/murphyke/Documents/code/python/junk-env/workdir/ericgazoni-openpyxl-085c2dd09f72/openpyxl/worksheet.py", line 761
        if orientation not in (self.ORIENTATION_PORTRAIT, self.ORIENTATION_LANDSCAPE),:
                                                                                     ^
    SyntaxError: invalid syntax
    
  6. murphyke reporter

    Charlie, the good news is that there is no exception. The bad news is that there are no worksheets in the resulting workbook object. For what it's worth, Excel 2008 for Mac and OO 4.0.1 both open this .xlsx file.

    I have managed to create a test file by exploding my original, replacing the shared strings with random gibberish, and re-zipping. I believe the test file exhibits the same characteristics as the original as far as openpyxl 1.6.2 and 1.8.0 are concerned.

  7. CharlieC

    @murphyke I've finally identified the problem with this file. It breaks the specification by not including a content-type for the worksheet: Bildschirmfoto 2014-06-08 um 15.01.05.png

    From ECMA 376 4th Edition Part 2: 10.1.2.3 Setting the Content Type of a Part When adding a new part to a package, the package implementer shall ensure that a content type for that part is specified in the Content Types stream

    I've added a workaround for this which shouldn't cause problems but the library you're using needs fixing.

  8. Log in to comment