when reading a sheet with data_only, containing an hyperlink but no cache value, do not try to force it as a numerical value, just return None (fixes #328)

#28 Merged at 8dbd1b3
  1. Eric Gazoni
No description

Comments (10)

  1. CharlieC

    I think it might make more sense to fix this in the parser: 1. check for the tag 2. check the text attribute of the tag if it exists.

  2. Eric Gazoni author

    well, that's why I'm doing a PR, I was hesitant about providing a test, because when I save the file manually in Excel, it seems it's being fixed and I cannot reproduce the issue anymore... The file was generated by openpyxl 2.x and I guess the writer messed up something, so I'm even wondering whether it's a good idea or not to fix this issue as we caused it in the first place, maybe we should just fix the root cause, what do you think @CharlieC ?

  3. CharlieC

    Every change should always come with a test. The difficult thing sometimes is working out what to test.

    The file supplied with the original bug will still raise an exception so this does need fixing but I do think that the best solution is always to ensure that None is passed in as this can be caught explicitly and any attempt to convert to a number avoided.

    value = node.find("{%}v" % SHEET_MAIN_NS)
    if value is not None: # "v" is an optional node
        value = value.text # will be None for an empty node

    This may need two tests: one for the parsing and one for ReadOnlyCell. I recently hacked IterableWorksheet so that you can pass in XML as a string which makes testing a lot easier.

    It's probably worth noting that I changed TYPE_NULL to 'n' as this is true for empty cells with no type.

    The change should probably be applied to the standard reader. I keep dreaming of being able to share the code here but I guess it's not really possible because of the JIT nature of ReadOnlyCells.

  4. Eric Gazoni author

    @CharlieC test added, fix implemented, only had to do it in the IterableWorksheet. The funny thing is that when you use findText(VALUE_TAG), it will return '' even if the node has no text, while if you use find(VALUE_TAG).text then it returns None properly...

  5. CharlieC

    I'd prefer a unit test with with a sheet.xml and an empty value tag as the error isn't really specific to hyperlinks.

    findtext behaves like that. I guess that the rationale is always to return a string. It's not a problem for standard worksheets as we run a regex over the string to see whether it can be a number but I prefer always having None for no value so I'll probably make the same change for that.

  6. Eric Gazoni author

    It seems that as you closed the 2.1 branch, bitbucket is not happy with me merging the changes. I'll have to figure out how to fix it cleanly using the command line :-)