Wrong numberformat assignment

Issue #931 resolved
Andrew Thornton created an issue

Hi!

First of all thank you for your work.

There appears to be a bug in the way numberformats are applied in stylesheets. The two attached xlsx files are the same except in the broken case within xl/styles.xml :

  <numFmts count="3">
    <numFmt numFmtId="164" formatCode="General"/>
    <numFmt numFmtId="165" formatCode="General"/>
    <numFmt numFmtId="166" formatCode="DD\/MM"/>
  </numFmts>

whereas in the not broken case:

  <numFmts count="3">
    <numFmt numFmtId="164" formatCode="General"/>
    <numFmt numFmtId="165" formatCode="0"/>
    <numFmt numFmtId="166" formatCode="DD\/MM"/>
  </numFmts>

And with a little more testing, it appears that any duplicated formatCode will lead to errors in the interpretation of numFmts greater than that duplication.

Comments (15)

  1. CharlieC

    It looks like these files were created by LibreOffice. Please explain the connection with openpyxl?

  2. Andrew Thornton reporter

    It's an xlsx file created by loffice but readable by excel.

    In any case I've done some hunting and the problem is caused by line 43 of openpyxl/utils/indexed_list.py which should be shifted one indentation level to the left as as it stands IndexedList is acting like a set.

  3. Andrew Thornton reporter

    Actually, looking a bit more at indexedlist I think the index method might also be incorrect. Line 24 reads:

        if value in self:
    

    Which if you're trying to circumvent slow list lookup should probably be:

        if value in self._dict:
    

    Although I haven't checked which is actually faster.

  4. CharlieC

    I don't understand the problem as it relates to openpyxl. I can edit both these files with openpyxl.

  5. Andrew Thornton reporter

    Ah ! Sorry my bug report wasn't clear.

    The format of C1 is incorrect in the broken file because it's assigned the wrong data type.

    Andy

  6. CharlieC

    That doesn't make things much clearer. The two files were created by LibreOffice, correct? "broken.xlsx" is considered to be defective by Excel 2016 for Mac so this is a problem for LibreOffice.

    Please provide relevant code examples and information. C1 is empty so I'm not sure what I'm looking for.

  7. Andrew Thornton reporter

    Sorry I meant A3. That's what I get for typing a response on my phone and not double checking.

    Interesting that Excel for Mac doesn't like the file - I'll try with excel on Windows - Google's sheets opens both files.

    The problem lies in the indexedlist - its index method is incorrect if the same object is placed into the list twice.

  8. CharlieC

    Excel for Windows thinks the file is rubbish as well. LibreOffice has quite a few bugs in its handling of styles and this to me looks like another one and duplicate formats make little or no sense.

    If you think there is a problem with IndexedList then please supply a relevant test.

  9. Andrew Thornton reporter

    How to get the problem results:

    from openpyxl import load_workbook
    print([row[0].value for row in load_workbook('not-broken.xlsx').active])
    print([row[0].value for row in load_workbook('broken.xlsx').active])
    
  10. Andrew Thornton reporter

    The problems that excel are complaining about were probably because I reformatted the styles file to make creating a simple testcase easier and I do agree that the styles are probably bugs in LibreOffice's handling of styles but there's nothing I can see the standard that says that the numFmts have to be unique.

    Here's a simple example of how IndexedList can end up with the wrong answer for index.

    from openpyxl.utils.indexed_list import IndexedList
    l = IndexedList(['A','A','B'])
    print(l[l.index('B')])
    

    The constructor IndexedList(<list>) is what is used by the getter for number_formats property. (line 143 in openpyxl/openpyxl/styles/stylesheet.py)

  11. CharlieC

    These styles still look like junk to me with "General" being redefined as a user-defined format. The specification is vague here but it really does not make sense to have different versions of the same style. I've recently managed to have the specification tightened on this when it comes to named styles.

  12. Andrew Thornton reporter

    OK, I've fixed the IndexedList to give correct indices when there are duplicated values and this resolves the issue.

  13. CharlieC

    Okay. What's actually happening here is that the number formats are not being reindexed probably because they are builtins.

  14. Log in to comment