I am able to open a relatively large file using load_workbook(filename) but it is extremely slow taking about 30 seconds to open the file. Adding the use_iterators=True parameter causes the load to fail with the following error:
Traceback (most recent call last): File "store_data.py", line 345, in <module> main() File "store_data.py", line 338, in main data_only=True, use_iterators=True) File "/home/matt/.local/lib/python3.4/site-packages/openpyxl/reader/excel.py", line 149, in load_workbook _load_workbook(wb, archive, filename, read_only, keep_vba) File "/home/matt/.local/lib/python3.4/site-packages/openpyxl/reader/excel.py", line 232, in _load_workbook worksheet_path=worksheet_path) File "/home/matt/.local/lib/python3.4/site-packages/openpyxl/reader/worksheet.py", line 324, in read_worksheet worksheet_path, xml_source, shared_strings, style_table) File "/home/matt/.local/lib/python3.4/site-packages/openpyxl/worksheet/iter_worksheet.py", line 73, in __init__ dimensions = read_dimension(self.xml_source) File "/home/matt/.local/lib/python3.4/site-packages/openpyxl/worksheet/iter_worksheet.py", line 37, in read_dimension min_col, min_row, sep, max_col, max_row = m.groups() AttributeError: 'NoneType' object has no attribute 'groups'
Peeking into the source, it seems to be caused by a failure of the regular expression utils.ABSOLUTE_RE to detect the dimensions of the file. I can't post my file because it contains confidential information. However, I was able to unzip the file and take a look at the XML. I believe this is the tag that is causing the failure of the iterator parser:
Looking at the ABSOLUTE_RE expression, it seems that there is no match to be had.
I wasn't able to get past this point to figure out why the file opens normally despite this bad formatting in my file.