Using iter_rows() with formulae and dates

Issue #264 resolved
Anonymous created an issue

When I have a cell formatted as a date containing the formula

=""

the optimized reader fails with

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\openpyxl\reader\iter_worksheet.py", line 219, in get_squared_range
    cell = cell._replace(internal_value=self._shared_date.from_julian(float(cell.internal_value)))
ValueError: could not convert string to float: =""

The regular reader works without any problems and gives None for that cell as I would expect.

Comments (10)

  1. CharlieC

    Actually, I'm not sure if the exception isn't the right way to handle this. openpyxl tries to convert Excel values to Python ones and depends on the formatting for this. What are cells with the value '=' formatted as dates supposed to be interpreted as? Considering them to be an error is the right place to start.

  2. Rob Klooster

    Maybe I should elaborate a bit more, since I reduced the problem so it can be easily reproduced. The real excel sheet has formulae which remember the date/time of the first update of a particular cell, like this:

    =IF(ISBLANK(H11),"",IF(P11="",NOW(),P11))
    

    If H11 has never been updated, the date field P11 will be empty. As such I am not interested in the formula as such, but the resulting data. But even the following code does not work:

    workbook = openpyxl.load_workbook(filename, data_only=True, use_iterators=True)
    worksheet = workbook.get_active_sheet()
    [row for row in worksheet.iter_rows()]
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Python27\lib\site-packages\openpyxl\reader\iter_worksheet.py", line 219, in get_squared_range
        cell = cell._replace(internal_value=self._shared_date.from_julian(float(cell.internal_value)))
    ValueError: could not convert string to float: 
    

    Of course, python doesn't know how to convert the empty string to a float. However, when I do not set use_iterators, it works and gives None for empty date cells.

  3. CharlieC

    Thanks for the additional information. The conflict is due to the order of resolution and resultant conflict between formatting and typing - in Python we don't differentiate. I think a quick fix might be possible. What is the actual value in P11?

    Does the normal reader work with data_only=True ? If it does, it is stripping the formatting which might not be desirable.

  4. Rob Klooster

    Yes the normal reader works. With data_only=False, it gives the formula as a string. With data_only=True it gives None.

    I think the main problem is that in Python there is no such thing as an empty datetime, so the closest equivalent would be None, I guess?

  5. Log in to comment