Tokenizer fails when encountering unexpected character

Issue #723 resolved
Gericke Potgieter
created an issue

Environment: Python 3.5.2 Openpyxl version: 2.4.0

Issue: In some cases Tokenizer fails when encountering an unexpected character. It was found that the issue lies with the parsing of ranges in other worksheets. The naming convention for other worksheets in formulas are as follows:

For a single word wordsheet name: Worksheet!A1 For a multiple word worksheet name: 'This Worksheet'!A1

However, in some instances the formula may contain a single worksheet name as follows: 'Worksheet'!A1 when it forms part of a range e.g. Worksheet!A1:'Worksheet'!A15

This will lead to an error as the single quotes are not expected. The traceback is as follows:

Traceback: Traceback (most recent call last):
  File "[project file]", line 1251, in test_all
    formula_table = get_formulas(full_workbook, file_name)
  File "[project file]", line 24, in get_formulas
    base_tokens = returns_tokens(cell)
  File "[project file]", line 105, in returns_tokens
    tok = Tokenizer(t_cell.value)
  File "/usr/local/lib/python3.5/dist-packages/openpyxl/formula/tokenizer.py", line 53, in __init__
    self._parse()
  File "/usr/local/lib/python3.5/dist-packages/openpyxl/formula/tokenizer.py", line 86, in _parse
    self.offset += dispatcher[curr_char]()
  File "/usr/local/lib/python3.5/dist-packages/openpyxl/formula/tokenizer.py", line 107, in _parse_string
    self.assert_empty_token()
  File "/usr/local/lib/python3.5/dist-packages/openpyxl/formula/tokenizer.py", line 298, in assert_empty_token
    (self.offset, self.formula))
openpyxl.formula.tokenizer.TokenizerError: Unexpected character at position 94 in '=IF('Debt Sizing'!$E$15="Yes",ROUND('Debt Sizing'!H227,2)+IF(X104=tenor*2,1-SUM(Inputs!$W$111:'Inputs'!W111)-ROUND('Debt Sizing'!H227,2)),'Debt Sizing'!H227)'

Due to the sensitivity of the workbook I am unfortunately unable to include a sample.

Comments (12)

  1. CharlieC

    So, able to create a simpler test case for this. As far as I can the following are logically equivalent. But the combination of the "=" prefix and the escaped worksheet title seem to cause problems.

    @pytest.mark.parametrize('formula', [
        "SUM(Inputs!$W$111:'Inputs'!W111)",
        "=SUM(Inputs!$W$111:'Inputs'!W111)",
        "=SUM(Inputs!$W$111:Inputs!W111)",
    ])
    def test_render(self, tokenizer, formula):
        tok = tokenizer.Tokenizer(formula)
        assert tok.render() == formula
    
  2. Felipe

    Wow that's odd. I'm pretty tied up right now, but can hopefully get to this in the next couple of weeks. Looking quickly, I think the issue is not with the = and quote, but rather the fact that the worksheet name appears twice, and is quoted in the second one. The fix here is to make : a range operator -- I'd originally wanted to do this, but now can't remember why I didn't, though I imagine it has to do with the translator.

  3. Calvin Keats

    @CharlieC I'm encountering what I believe is a similar error. When calling load_workbook(), I get a I have an error: "openpyxl.formula.tokenizer.TokenizerError: Unexpected character at position 138 in", followed by a formula within my file.

    The character is a newline - removing it fixed the issue for that cell, but unfortunately I have several formulas formatted like this in my file that I will be fixing.

    I sometimes use the newline within the formula simply for readability; sometimes functions on functions on functions gets complicated. I believe my error can be remedied by stripping out newlines when parsing formulas.

  4. Log in to comment