version 1.5.8, 1.6.1 - wbPr is None when reading xlsx file
This may not be a problem with openpyxl, but since Excel (2008 for Mac) can read the file successfully, I thought I would report it.
Openpyxl blows up (trace below) when reading this file, but if I open and re-save it with MS Excel, openpyxl can, not surprisingly, read the rewritten file with no problems.
The metadata for this file suggests that it was created by a product using the Axolot Data XLSReadWriteII library (v 4.00.38).
Unfortunately I am not permitted to provide the file, and I have no way of creating a file that reproduces the problem. This may be a bug in the Axolot library; the current version of that is 4.00.66.
File "/Users/johnbigboote/Documents/code/python/pcgc-env/lib/python2.7/site-packages/openpyxl/reader/excel.py", line 115, in load_workbook _load_workbook(wb, archive, filename, use_iterators) File "/Users/johnbigboote/Documents/code/python/pcgc-env/lib/python2.7/site-packages/openpyxl/reader/excel.py", line 140, in _load_workbook wb.properties.excel_base_date = read_excel_base_date(xml_source=archive.read(ARC_WORKBOOK)) File "/Users/johnbigboote/Documents/code/python/pcgc-env/lib/python2.7/site-packages/openpyxl/reader/workbook.py", line 82, in read_excel_base_date if ('date1904' in wbPr.keys() and wbPr.attrib['date1904'] in ('1', 'true')): AttributeError: 'NoneType' object has no attribute 'keys'
Random asides: your issues form needs updating: not all versions appear in the Version dropdown. Also, I can't search by Version in the issues search form. Also, I didn't notice any release notes/change log information anywhere for openpyxl.
Comments (22)
-
-
I find that pandas with pd.ExcelFile(filename) and xl.parse read the XLSX file ok.
-
@Finn Årup Nielsen Can you supply a sample file? I'm not really sure what we can do about this without something to work with.
-
@Kevin Murphy Can you let us know whether the problem still exists in 1.7 which included some fixes for date problems.
-
"Can you supply a sample file?" No, unfortunately as it includes personally identifiable information.
-
It works apparently ok for me now!
>>> import openpyxl >>> openpyxl.reader.excel.load_workbook("E13_0_Resultater.xlsx") <openpyxl.workbook.Workbook object at 0x17d47d0> >>> openpyxl.__version__ '1.7.0'
-
Charlie,
FWIW, my test file produces an entirely different error with 1.7. I don't have time to look at it right now, unfortunately. If you want me to take a few minutes next week, let me know.
>>> openpyxl.reader.excel.load_workbook("DataHub_02-05-2013.xlsx") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.7/site-packages/openpyxl/reader/excel.py", line 136, in load_workbook _load_workbook(wb, archive, filename, use_iterators, keep_vba) File "/usr/local/lib/python2.7/site-packages/openpyxl/reader/excel.py", line 158, in _load_workbook wb.read_workbook_settings(archive.read(ARC_WORKBOOK)) File "/usr/local/lib/python2.7/site-packages/openpyxl/workbook.py", line 108, in read_workbook_settings if 'activeTab' in view.attrib: AttributeError: 'NoneType' object has no attribute 'attrib' >>> openpyxl.__version__ '1.7.0'
-
- edited description
-
Okay, there is a pull request for that error but no tests. But I am very reluctant to accept workaround code without at least relevant tests in the absence of real files.
-
@Kevin Murphy We're getting ready for a 1.8 release. I'd be very grateful if you could check what happens with trunk and let me know.
-
Using https://bitbucket.org/ericgazoni/openpyxl/get/1.8.tar.bz2, install doesn't work:
(junk-env)Kevin-Murphys-iMac:ericgazoni-openpyxl-085c2dd09f72 murphyke$ python setup.py install Traceback (most recent call last): File "setup.py", line 28, in <module> import openpyxl # to fetch __version__ etc File "/Volumes/bigdisk/murphyke/Documents/code/python/junk-env/workdir/ericgazoni-openpyxl-085c2dd09f72/openpyxl/__init__.py", line 38, in <module> from openpyxl import workbook File "/Volumes/bigdisk/murphyke/Documents/code/python/junk-env/workdir/ericgazoni-openpyxl-085c2dd09f72/openpyxl/workbook.py", line 36, in <module> from openpyxl.worksheet import Worksheet File "/Volumes/bigdisk/murphyke/Documents/code/python/junk-env/workdir/ericgazoni-openpyxl-085c2dd09f72/openpyxl/worksheet.py", line 761 if orientation not in (self.ORIENTATION_PORTRAIT, self.ORIENTATION_LANDSCAPE),: ^ SyntaxError: invalid syntax
-
@Kevin Murphy Sorry about that typo. Don't quite know how I pushed it without running the tests. Could you please try again?
-
Charlie, the good news is that there is no exception. The bad news is that there are no worksheets in the resulting workbook object. For what it's worth, Excel 2008 for Mac and OO 4.0.1 both open this .xlsx file.
I have managed to create a test file by exploding my original, replacing the shared strings with random gibberish, and re-zipping. I believe the test file exhibits the same characteristics as the original as far as openpyxl 1.6.2 and 1.8.0 are concerned.
-
Can you attach the file to this issue?
-
- attached openpyxl_181_test.xlsx
-
Thanks. No idea what the problem is but I guess not raising an exception can be considered progress! ;-)
-
@Kevin Murphy can you try the 1.9 branch with your file?
-
I can confirm no error in 2.x but also no worksheets. Will investigate.
-
- removed version
-
assigned issue to
-
@Kevin Murphy I've finally identified the problem with this file. It breaks the specification by not including a content-type for the worksheet:
From ECMA 376 4th Edition Part 2: 10.1.2.3 Setting the Content Type of a Part When adding a new part to a package, the package implementer shall ensure that a content type for that part is specified in the Content Types stream
I've added a workaround for this which shouldn't cause problems but the library you're using needs fixing.
-
- changed status to resolved
Resolves
#181→ <<cset 143f21dafcfa>>
-
- removed component
Removing component: reader (automated comment)
- Log in to comment
I get this error too with a file generated from a web system, - not Excel.
As a bad workaround I went into line 82 of openpyxl/reader/workbook.py and did:
if wbPr is not None and ('date1904' in wbPr.keys() and wbPr.attrib['date1904'] in ('1', 'true')):
One date that I checked is correct in Python, compared to how it is displayed in libreoffice, so perhaps CALENDAR_WINDOWS_1900 should be returned in these cases?