Cannot open files generated using tealeg/xlsx

Issue #649 wontfix
pwaller created an issue

Hi!

Further to https://bitbucket.org/openpyxl/openpyxl/issues/624/downloaded-xlsx-file-from-pdftablescom we followed up with the library author. They made a fix which improved things a bit, but something else is still broken.

https://github.com/tealeg/xlsx/issues/211

The author of that library said:

ECMA-376 standard for Office Open XML (and the XML stylesheets provided with it) define ST_BorderStyle with "none" as a valid value.

Here are example files that don't work and minimal code to reproduce:

https://github.com/tealeg/xlsx/files/260630/27f0f70.xlsx https://github.com/tealeg/xlsx/files/260631/7ffa59c.xlsx

$ python3 -c "from openpyxl import load_workbook; wb2 = load_workbook('27f0f70.xlsx')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.5/site-packages/openpyxl/reader/excel.py", line 203, in load_workbook
    parsed_styles = read_style_table(archive)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/reader/style.py", line 181, in read_style_table
    p.parse()
  File "/usr/local/lib/python3.5/site-packages/openpyxl/reader/style.py", line 50, in parse
    self.border_list = IndexedList(self.parse_borders())
  File "/usr/local/lib/python3.5/site-packages/openpyxl/utils/indexed_list.py", line 17, in __init__
    for idx, val in enumerate(iterable):
  File "/usr/local/lib/python3.5/site-packages/openpyxl/reader/style.py", line 99, in parse_borders
    yield Border.from_tree(border_node)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/descriptors/serialisable.py", line 65, in from_tree
    obj = desc.expected_type.from_tree(el)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/descriptors/serialisable.py", line 78, in from_tree
    return cls(**attrib)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/styles/borders.py", line 46, in __init__
    self.style = style
  File "/usr/local/lib/python3.5/site-packages/openpyxl/descriptors/base.py", line 143, in __set__
    super(NoneSet, self).__set__(instance, value)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/descriptors/base.py", line 128, in __set__
    raise ValueError(self.__doc__)
ValueError: Value must be one of {'medium', 'dotted', 'mediumDashed', 'dashDotDot', 'hair', 'thin', 'double', 'dashed', 'mediumDashDot', 'slantDashDot', 'dashDot', 'thick', 'mediumDashDotDot'}

$ python3 -c "from openpyxl import load_workbook; wb2 = load_workbook('7ffa59c.xlsx')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.5/site-packages/openpyxl/reader/excel.py", line 203, in load_workbook
    parsed_styles = read_style_table(archive)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/reader/style.py", line 181, in read_style_table
    p.parse()
  File "/usr/local/lib/python3.5/site-packages/openpyxl/reader/style.py", line 50, in parse
    self.border_list = IndexedList(self.parse_borders())
  File "/usr/local/lib/python3.5/site-packages/openpyxl/utils/indexed_list.py", line 17, in __init__
    for idx, val in enumerate(iterable):
  File "/usr/local/lib/python3.5/site-packages/openpyxl/reader/style.py", line 99, in parse_borders
    yield Border.from_tree(border_node)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/descriptors/serialisable.py", line 65, in from_tree
    obj = desc.expected_type.from_tree(el)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/descriptors/serialisable.py", line 78, in from_tree
    return cls(**attrib)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/styles/borders.py", line 46, in __init__
    self.style = style
  File "/usr/local/lib/python3.5/site-packages/openpyxl/descriptors/base.py", line 143, in __set__
    super(NoneSet, self).__set__(instance, value)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/descriptors/base.py", line 128, in __set__
    raise ValueError(self.__doc__)
ValueError: Value must be one of {'slantDashDot', 'thick', 'thin', 'dotted', 'double', 'mediumDashed', 'dashDotDot', 'hair', 'dashed', 'medium', 'dashDot', 'mediumDashDotDot', 'mediumDashDot'}

Comments (13)

  1. CharlieC

    There are one or two parts of the specification that equate x="none" with x missing or None in Python. Unfortunately, this is a logical ambiguity that requires special handling in openpyxl. It makes much more sense never to set this attribute in the first place.

  2. pwaller reporter

    @charlie_x could you point to those parts of the spec so that I can go back to the tealeg/xlsx author with some evidence and ask him to fix it? Otherwise we have a stalemate.

  3. CharlieC

    The spec does allow 'none' == None but this is nonsense in Python Which means that we'd have to have code that checks for "none" and converts it to a Python None so that is introspectable: if border.side.style:…

    This is the sort of thing that you come across with read support. As there is no reason to serialise "none" in such cases, well markers on charts I think may be an exception because the default is "true", it's best if they just never get written in the first place. I certainly have no intention of writing special case code here.

  4. pwaller reporter

    @charlie_x searching the specification I find statements like this, which seem to imply that the logic is the other way around, that is: absent should be treated as though it's set to "none":

    For example from Ecma Office Open XML Part 1 - Fundamentals And Markup Language Reference.pdf page 1008:

    If this attribute is omitted, the consumer shall behave as though there are no editing restrictions applied to this document; equivalent to an attribute value of none .

  5. pwaller reporter

    @charlie_x it's an unfortunate choice to be driven by what's natural in the language rather than by the spec and interoperability.

  6. CharlieC

    This is an example of specification being back-to-front. "none" is not a special value in XML, it's just another string that gets special cased. This is unnecessarily complex. I have filed issues with the ECMA 476 WG on exactly this issue.

  7. CharlieC

    The specification should promote interoperability. But there are many areas where it does exactly the opposite. In such situations do not blame the libraries, blame the committee that fast-tracked this specification, knowing that it was defective.

  8. Log in to comment