Add a value cache to `cell.Cell`

Issue #251 wontfix
Felipe created an issue

Proposed enhancement

It would be great to have a way to populate the "value cache" (<v> tag in the XML) for cells with formulae. Given that openpyxl won't have the math engine to do so automatically (cf issue #11), it would be nice to have a way to interface with the openpyxl OOXML writer from an outside engine.

Such an interface would be helpful even in the absence of a math engine for many simple cases that the programmer could compute directly (E.g., a row of dates implemented as '=RC[-1] + 1', a simple total sum, etc.). The interface could also enable keeping the value cache from a workbook that is read (like in @Eric Gazoni's original proposal from issue #11).

Filling in the value cache for cells with formulae would be helpful if the workbook is going to be opened with applications that don't support computing the result. (Many mobile applications, openpyxl itself pre 1.6, etc.)

Cache-invalidation, propagation, etc. would fall on the programmer, except for perhaps a convenience function to clear the cache globally.

Proposed implementation

The key changes would have to come in lines 303-312 of writer.worksheet, which handle writing cells with formulae.

My idea is to add a new attribute to cell.Cell (maybe called value_cache). Then, when writing the cell contents to the file, if the cell is of type TYPE_FORMULA the writer checks this value_cache, and if not None writes a <v> tag with it.

Optionally, there could also be a workbook.clear_cache_globally method to set all cell.Cells' value_cache to None.

Questions

I'm happy to create a pull request for this change but wanted to get some input on it before jumping in. Specifically, does the OOXML schema/spec restrict the values that the <v> tag can assume? If the answer is yes, the writer would have to also support checking the value (The writer could then silently ignoring that cache value in case of failure).

Comments (6)

  1. CharlieC

    We support the data_only flag for preserving the values of an existing workbook if these instead of the formula are required.

  2. Felipe reporter

    I think the main value of this change would be when writing the book, not when reading. Perhaps an example would clarify the envisioned use case:

    for i, res in enumerate(results):  # a list with 10 items
        ws.cell(row=i, column=0).value = res
    
    ws.cell(row=10, column=0).value = '=SUM(A1:A10)'
    ws.cell(row=10, column=0).value_cache = sum(results)
    

    This would allow clients without a math engine (esp. mobile phones) to see a value for cell A10.

  3. CharlieC

    Sorry, but that's a deceptively simple example. In real life formulae can be extremely complicated include things like shared and array formulae. This just can't be done without a dedicated maths/formula engine and that is just out of scope.

    In practice it would be both more reliable and easier to run something like a headless OpenOffice and get it to and save the file so that cached values will be added.

  4. Felipe reporter

    I'm not saying openpyxl needs to implement the math engine. (There are others out there already; e.g. pycel).

    I think there are also enough "deceptively simple" uses to justify inclusion of this feature. I for instance have to generate several reports like this one (60 kB) daily and adding <v> tags for all the totals and subtotals would be tremendously helpful for users checking their email on mobile devices. Without them their spreadsheet applications display "0" or blanks on all the formulae cells.

  5. CharlieC

    ^There are a number of things here:

    • thanks for the tip about PyCel - maybe we can make use of this in future, although the licence is a problem
    • reports for "dumb" clients probably shouldn't have formulae in them at all
    • formulae that cross worksheets or evaluate items at run time

    I do understand your desire to provide reports for your users. But if they have software which cannot evaluate formulae, why are you using formulae in the first place? Or, if you wish to avoid duplicate code, simply create the file in openpyxl and have headless OpenOffice calculate cached values for you. Or don't use Excel as the format.

    I have a huge problem with your proposal for partial support with the emphasis on the programmer to handle things correctly. It might work for you but that is a very specific use case. In general, I think this is likely to cause confusion.

    That said, feel free to fork the project and we can see where it goes.

  6. Log in to comment