when reading a sheet with data_only, containing an hyperlink but no cache value, do not try to force it as a numerical value, just return None (fixes #328)

#28 Merged at 8dbd1b3
Repository
openpyxl-328
Branch
2.1
Repository
openpyxl
Branch
2.1
Author
  1. Eric Gazoni
Reviewers
Description
No description

Comments (10)

  1. Eric Gazoni author

    well, that's why I'm doing a PR, I was hesitant about providing a test, because when I save the file manually in Excel, it seems it's being fixed and I cannot reproduce the issue anymore... The file was generated by openpyxl 2.x and I guess the writer messed up something, so I'm even wondering whether it's a good idea or not to fix this issue as we caused it in the first place, maybe we should just fix the root cause, what do you think @Charlie Clark ?

  2. Charlie Clark

    Every change should always come with a test. The difficult thing sometimes is working out what to test.

    The file supplied with the original bug will still raise an exception so this does need fixing but I do think that the best solution is always to ensure that None is passed in as this can be caught explicitly and any attempt to convert to a number avoided.

    value = node.find("{%}v" % SHEET_MAIN_NS)
    if value is not None: # "v" is an optional node
        value = value.text # will be None for an empty node
    …
    

    This may need two tests: one for the parsing and one for ReadOnlyCell. I recently hacked IterableWorksheet so that you can pass in XML as a string which makes testing a lot easier.

    It's probably worth noting that I changed TYPE_NULL to 'n' as this is true for empty cells with no type.

    The change should probably be applied to the standard reader. I keep dreaming of being able to share the code here but I guess it's not really possible because of the JIT nature of ReadOnlyCells.

  3. Eric Gazoni author

    @Charlie Clark test added, fix implemented, only had to do it in the IterableWorksheet. The funny thing is that when you use findText(VALUE_TAG), it will return '' even if the node has no text, while if you use find(VALUE_TAG).text then it returns None properly...

  4. Charlie Clark

    I'd prefer a unit test with with a sheet.xml and an empty value tag as the error isn't really specific to hyperlinks.

    findtext behaves like that. I guess that the rationale is always to return a string. It's not a problem for standard worksheets as we run a regex over the string to see whether it can be a number but I prefer always having None for no value so I'll probably make the same change for that.

  5. Eric Gazoni author

    It seems that as you closed the 2.1 branch, bitbucket is not happy with me merging the changes. I'll have to figure out how to fix it cleanly using the command line :-)