Unicode error when loading workbook

Issue #663 resolved
Jeroen Beckers
created an issue

Openpyxl version: openpyxl-2.4.0b1-py2.7 on Windows 10.

When I use load_workbook on a specific xlsx file, it gives the following error output:

Traceback (most recent call last):
  File ".\generate.py", line 758, in <module>
    wb = openpyxl.reader.excel.load_workbook(sheetFileName, data_only="True")
  File "C:\Python27\lib\site-packages\openpyxl-2.4.0b1-py2.7.egg\openpyxl\reader\excel.py", line 152, in load_workbook
    parser.parse()
  File "C:\Python27\lib\site-packages\openpyxl-2.4.0b1-py2.7.egg\openpyxl\packaging\workbook.py", line 54, in parse
    rel.Target))
  File "C:\Python27\lib\site-packages\openpyxl-2.4.0b1-py2.7.egg\openpyxl\workbook\external_link\external.py", line 186, in read_external_link
    book = ExternalLink.from_tree(node)
  File "C:\Python27\lib\site-packages\openpyxl-2.4.0b1-py2.7.egg\openpyxl\descriptors\serialisable.py", line 76, in from_tree
    obj = desc.expected_type.from_tree(el)
  File "C:\Python27\lib\site-packages\openpyxl-2.4.0b1-py2.7.egg\openpyxl\descriptors\serialisable.py", line 76, in from_tree
    obj = desc.expected_type.from_tree(el)
  File "C:\Python27\lib\site-packages\openpyxl-2.4.0b1-py2.7.egg\openpyxl\descriptors\serialisable.py", line 76, in from_tree
    obj = desc.expected_type.from_tree(el)
  File "C:\Python27\lib\site-packages\openpyxl-2.4.0b1-py2.7.egg\openpyxl\descriptors\serialisable.py", line 76, in from_tree
    obj = desc.expected_type.from_tree(el)
  File "C:\Python27\lib\site-packages\openpyxl-2.4.0b1-py2.7.egg\openpyxl\descriptors\serialisable.py", line 76, in from_tree
    obj = desc.expected_type.from_tree(el)
  File "C:\Python27\lib\site-packages\openpyxl-2.4.0b1-py2.7.egg\openpyxl\descriptors\serialisable.py", line 89, in from_tree
    return cls(**attrib)
  File "C:\Python27\lib\site-packages\openpyxl-2.4.0b1-py2.7.egg\openpyxl\workbook\external_link\external.py", line 47, in __init__
    self.v = v
  File "C:\Python27\lib\site-packages\openpyxl-2.4.0b1-py2.7.egg\openpyxl\descriptors\nested.py", line 36, in __set__
    super(Nested, self).__set__(instance, value)
  File "C:\Python27\lib\site-packages\openpyxl-2.4.0b1-py2.7.egg\openpyxl\descriptors\base.py", line 69, in __set__
    value = _convert(self.expected_type, value)
  File "C:\Python27\lib\site-packages\openpyxl-2.4.0b1-py2.7.egg\openpyxl\descriptors\base.py", line 59, in _convert
    raise TypeError('expected ' + str(expected_type) + ' -- ' + str(value.encode("utf-8")))
TypeError: expected <type 'str'> -- C├®dric

As you can see, I have already modified the base.py file so that I have some feedback over what goes wrong.

The first thing I tried was a "replace all" on Cédric -> Cedric, but unfortunately, the same error is given (even though Cédric shouldn't be present in the xlsx anymore).

Unfortunately, I can't share the xlsx file, and I also haven't been able to reduce the problem to a presentable file.

This works fine on my OS X computer, but fails on Windows.

Are there any other places I should be looking for "Cédric" ? Places that excel won't find itself? And shouldn't everything be loaded in unicode anyway?

Comments (13)

  1. Jeroen Beckers reporter

    Unfortunately, I can not as it's a very sensitive xls file. I could strip it down, but the information is somewhere non-visible so I would have no idea how much information I'm still sharing.

    I think I've narrowed it down a little further. The excel file that I'm using references external xml files. However, I don't have these xml files and the value appears to be cached somewhere. There's a cell for Cédric, and right next to that is a vlookup on an external file. If I change it to Cedric, it says N/A. If I change it back, it loads the original lookup value again. So I'm guessing that openpyxl tries to process that cached information?

  2. Jeroen Beckers reporter

    Alright, I was able to reproduce the situation, and indeed, this is where the problem is.

    I've attached a demo file, which did a vlookup in a different document I deleted.

  3. CharlieC

    Files with external links cache the data from the external workbooks. openpyxl 2.4 preserves this caching. The 2.4 branch has an option to disable external links when loading workbooks.

  4. Jeroen Beckers reporter

    Do you mean the 2.5 branch maybe? I can't see anything in the 2.4 about it.

    Also, shouldn't the caching be parsed identically to the rest of the file: with unicode ?

  5. CharlieC

    Thanks for the file. The code is different to normal cell-parsing code which is optimised for speed and memory. Need to get more experience with cases like this to decide whether we should preserve these cached values (it can slow things down a lot) but in read/write situations it can be useful.

  6. Log in to comment