data_type of RawCell wrong in one situation.

Issue #177 resolved
codemonkey1991 created an issue

Data type is either 's' or 'n' in all situations except for cells with formulas that result in strings, whose data types are 'str'. Instead of 'str' the data type should probably be 's'.

A string formula is for instance: "=CONCATENATE(A2;A3)"

I have attached the Excel document I used for testing, just in case.

Comments (16)

  1. codemonkey1991 reporter

    Well damn. Anyway, if you have Excel it should take you 2 minutes to create a document for testing. Just put the strings "Hello, " and "world!" into A1 and A2 respectively, then put "=CONCATENATE(A1;A2)" into B1. The data types of A1 and A2 will be "s" in openpyxl, while B2 will be "str" - even though they are all strings.

  2. CharlieC

    I don't have Excel on my main machine. But sure, it's easy enough to do. Would be good if you could add it if you got one to hand.

  3. CharlieC

    Thanks. I've just been looking at how this is handled. It looks like there are preparations for handling the case using the TYPE_FORMULA_CACHE_STRING = 'str' but there is no code that uses that for detection. Then again, looking at the tests, I'm surprised that any formula survive an import, do they?

    Probably need to ensure full test coverage for the existing conversions and then extend them to handle this case.

  4. codemonkey1991 reporter

    The value of the formula cell is indeed "Hello, world!" in openpyxl, even when using the faster iterator reader. I'm guessing that the result of the formula is saved in the xlsx file along with the formula.

  5. CharlieC

    This is the source of a slightly amended sheet. I added a simple addition of two numbers to have another formula.

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac"><dimension ref="A1:B6"/><sheetViews><sheetView tabSelected="1" workbookViewId="0"><selection activeCell="A6" sqref="A6"/></sheetView></sheetViews><sheetFormatPr baseColWidth="10" defaultColWidth="9.140625" defaultRowHeight="15" x14ac:dyDescent="0.25"/><cols><col min="1" max="1" width="15.7109375" customWidth="1"/><col min="2" max="2" width="15.28515625" customWidth="1"/></cols><sheetData><row r="1" spans="1:2" x14ac:dyDescent="0.25"><c r="A1" t="s"><v>0</v></c><c r="B1" t="str"><f>CONCATENATE(A1,A2)</f><v>Hello, world!</v></c></row><row r="2" spans="1:2" x14ac:dyDescent="0.25"><c r="A2" t="s"><v>1</v></c></row><row r="4" spans="1:2" x14ac:dyDescent="0.25"><c r="A4"><v>1</v></c></row><row r="5" spans="1:2" x14ac:dyDescent="0.25"><c r="A5"><v>2</v></c></row><row r="6" spans="1:2" x14ac:dyDescent="0.25"><c r="A6"><f>SUM(A4:A5)</f><v>3</v></c></row></sheetData><pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/></worksheet>
    

    In both cases Openpyxl ignores the formula and just extracts the value: 'Hello, world' and 3. The formula itself is lost. As cells using the iterator don't have a formatting method I'm not sure how much it matters that 'str' is not coerced to 's' but I can submit a patch that does the same kind of checking with iterator that happens without.

    The only support for formulae seems to be when writing them in which case, in addition to being assigned the 'f' datatype, they must also begin '='

    It's probably more important to preserve formulae in general when reading files.

  6. Iuri de Silvio

    I know it is already merged, but how can I evaluate the formula? Before this patch, I used internal_value, but now it returns the formula instead of the value.

    The formula is useless to me.

  7. CharlieC

    @iurisilvio You don't evaluate the formula, Excel does. That's the problem: what might be "useless" to you is exactly what others want and vice versa. It might possible to use guess_types= True to get the cached value - there is no guarantee that this will be correct - instead of the formula.

  8. Iuri de Silvio

    Yes, but before this change, the internal_value returned the value instead of the formula, this change break my code. :/

    Looks like guess_types does not work with use_iterators=True.

  9. CharlieC

    @iurisilvio
    Your code was never guaranteed to run. internal_value is a wart I'd kind of like to remove for most situations. The change initially ensured consistent behaviour across readers but this was broken by the guess_types addition.

    One of the things I've been looking at is ensuring consistent behaviour across readers (and writers) and reducing code duplication wherever possible whilst maintaining the performance advantages. I'll see if I can get guess_types into the optimised reader. Won't be in 1.7 but you seem to be using a checkout anyway. Pin to 1.6.2 in the meantime to stop your code breaking.

  10. Iuri de Silvio

    @charlie_x I tested the 1.8 branch because the current version misinterpreted some numbers as datetime values and it is already fixed (60274d3?).

    So I decided to test 1.8 to solve this issue and had this other problem with formulas instead of cached values, so I rolled back to current release and fix manually these wrong datetime values (just have to remove number format in Excel).

    I have no problem using a checkout instead of the stable version if it works for me.

    Thanks!

  11. CharlieC

    @iurisilvio Two things: this bug is remaining closed. The underlying issue is not resolved: how to handle cells with formulae. As far as Excel is concerned the values of such cells are the formulae with the results cached for convenience. This is why I made the changes I did. The matter should get more discussion on the mailing list and not on this bug.

  12. Log in to comment