Parse function identifies RANGE incorrectly

Issue #1091 resolved
Hans Trevor Wilms created an issue

My program is supposed to take each cell with a formula in the worksheet and "explode" it. This means it parses the excel formula and replaces each cell reference (each RANGE, ie A22) with the value of that cell surrounded by parenthesis. So if excel cell A22=A3A5, and cell B10 value is =A22+10, B10 becomes = (A3A5)+10. The program then keeps looping and replacing the cell values until they're just raw numbers and functions.

This works well until until the program gets to a particular cell in the excel file I'm working on, C34.

C34=IF(B1_Data,IF(B1_Suf,IF(B1_05,BD34,""),IF(AND(B2_Data,B1B2_Suf),IF(B1_05,BD34,""),BD30)),BD29)

The first and second iteration goes well...

'C34_lvl0': 'IF($BA$29,IF($BA$31,IF($BA$33,BD34,""),IF(AND($BA$30,$BA$32),IF($BA$33,BD34,""),BD30)),BD29)'

'C34_lvl1': '=IF((NOT(OR(I29="",I30="",I31="",I31=10.0001,I32=""))),IF((AH32<0),IF((AND(#REF!>0,#REF!<0.5)),(LD-ddTaq MASS TOO SMALL - INCREASE BUILD SIZE),""),IF(AND((NOT(OR(I36="",I37="",I38="",I38=10.0001,I39=""))),(AH39<0)),IF((AND(#REF!>0,#REF!<0.5)),(LD-ddTaq MASS TOO SMALL - INCREASE BUILD SIZE),""),(INSUFFICIENT LD-ddTaq TO SUPPORT BUILD, USE 2ND BATCH))),(ENTER ACTUAL LD-ddTaq POLYMERASE VALUES FOR LOT #1))

But then I get the below error with the next iteration. When I parse the above (lvl1) formula, I see it is showing (NOT(OR(I29="",I30="",I31="",I31=10.0001,I32=""))) as a single range. This does not happen in other excel IF functions.

Anyone know what is going on or a work around? Is this a bug? I'm using openpyxl version 2.4.9 and python 3.6. I attached my code if it helps.

The Error:

Traceback (most recent call last): File "C:\Python\20180818HW_ValidationAid_v016.py", line 186, in tok=Tokenizer(tFinal) #dissect each cell, do the spiel File "C:\Python\lib\site-packages\openpyxl\formula\tokenizer.py", line 53, in init self._parse() File "C:\Python\lib\site-packages\openpyxl\formula\tokenizer.py", line 86, in _parse self.offset += dispatchercurr_char File "C:\Python\lib\site-packages\openpyxl\formula\tokenizer.py", line 157, in _parse_error (self.offset, self.formula)) openpyxl.formula.tokenizer.TokenizerError: Invalid error code at position 402 in '=IF((NOT(OR(I29="",I30="",I31="",I31=10.0001,I32=""))),IF((AH32<0),IF((AND(#REF!>0,#REF!<0.5)),(LD-ddTaq MASS TOO SMALL - INCREASE BUILD SIZE),""),IF(AND((NOT(OR(I36="",I37="",I38="",I38=10.0001,I39=""))),(AH39<0)),IF((AND(#REF!>0,#REF!<0.5)),(LD-ddTaq MASS TOO SMALL - INCREASE BUILD SIZE),""),(INSUFFICIENT LD-ddTaq TO SUPPORT BUILD, USE 2ND BATCH))),(ENTER ACTUAL LD-ddTaq POLYMERASE VALUES FOR LOT #1))'

Comments (6)

  1. CharlieC

    Looking again at this and I don't really think it's a bug. The tokeniser can happily decompose and then recompose the formula.

    Are you trying to use openpyxl to evaluate formulae? If so, the library isn't suitable for this. You're better of using something like Pycel which purports to do this. I suspect that there are gotcha related to the order of evaluation in this kind of thing.

  2. Hans Trevor Wilms reporter

    I was trying to use it evaluating formulas, yes. Specifically it would parse one formula, then replace all precedent cells mentioned with each precedent cell's value. It would do this until all final output cells were one big equation of static values. This would be successful for a few rounds, then hit this little snag I described above. If you don't think it's a bug, I'll believe you.

    Thank you for looking into it again! I'll check out Pycel, haven't heard of it before.

  3. CharlieC

    You probably have some kind of circular reference in there, the applications have ways of dealing with that. If the formula isn't already evaluated in there then just open the file with Excel or OpenOffice and read it in data-only mode with Excel.

    FWIW I think you're approach is flawed. Much better to transliterate the formulae to Python equivalents using the tokeniser.

  4. Log in to comment