Cannot read Excel file from World Bank

Issue #597 resolved
Jonáš Petrovský
created an issue

Hello, I need to read an Excel file exported from World Bank database (http://databank.worldbank.org/data/reports.aspx?source=doing-business).

Unfortunately opening the file (openpyxl.load_workbook) does not work - there is an exception:

AttributeError: 'NoneType' object has no attribute 'upper'

I found this SO question (http://stackoverflow.com/questions/34387995/openpyxl-nonetype-object-has-no-attribute-upper), but there is no answer, just that it might be a bug.

I'm using version 2.3.3. I'm enclosing a small file which also triggers the problem. Please help, thanks :).

Traceback

Traceback (most recent call last):
  File "C:/X/test.py", line 8, in <module>
    ip = IndicatorsProcessor(file_paths)
  File "C:\X\IndicatorsProcessor.py", line 16, in __init__
    self.world_bank_workbook = openpyxl.load_workbook(filename=in_bank)
  File "C:\Python27\lib\site-packages\openpyxl\reader\excel.py", line 234, in load_workbook
    parser.parse()
  File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 106, in parse
    dispatcher[tag_name](element)
  File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 243, in parse_row_dimensions
    self.parse_cell(cell)
  File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 182, in parse_cell
    row, column = coordinate_to_tuple(coordinate)
  File "C:\Python27\lib\site-packages\openpyxl\utils\__init__.py", line 162, in coordinate_to_tuple
    col, row = coordinate_from_string(coordinate)
  File "C:\Python27\lib\site-packages\openpyxl\utils\__init__.py", line 39, in coordinate_from_string
    match = COORD_RE.match(coord_string.upper())
AttributeError: 'NoneType' object has no attribute 'upper'

Comments (4)

  1. Charlie Clark

    Thanks for the report. I wondered if we'd ever see a file like this. The problem is that the file provides no location information about rows or cells. This is within the specification but, as it's also perfectly okay to skip rows and cells (allows significant memory optimisation), difficult to work with both options and all our code currently relies upon getting the absolute coordinates from each cell.

    Unfortunately, I can't think of a quick fix for this that won't impact performance significantly and will work for both implementations. In the meantime it looks like xlrd can handle this kind of file.

  2. Log in to comment