Expression with illegal unicode chars shall be rejected.

Issue #821 new
Former user created an issue

When expression contains unicode charactor, like "12天2", the evaluator automatically skips the unicode and produces an unexpected number "122".

It will be very confusing when some similar unicode is filtered out. E.g. "12/1,2" will result "1"

It would be better to reject such kind of expression, so that user can notice the issue.

Comments (12)

  1. Tey'

    These characters are detected as being thousand separators, that's why they are silently ignored. There used to be an option to make the detection more strict (that is, only detecting known separator characters as such), but it has been removed to prevent confusion. Dunno if we should add it back.

    The main advantage of detecting any invalid characters as thousand separators is to detect currency characters as separators so that they get ignored by the parser.

  2. Helder Correia repo owner

    @teyut Maybe we should "swallow" just a set of well-defined characters that makes sense: commas, dots, spaces, apostrophes, etc, and yield an error with everything else.

  3. Tey'

    It's okay for the digit group separators, but not for currency symbols for instance. I still like the possibility to have an option to choose between the 2 modes.

    We might also only allow known digit group separators between digits, but allow anything else at the end or start of a number.

  4. Alister Hood

    These characters are detected as being thousand separators, that's why they are silently ignored. There used to be an option to make the detection more strict (that is, only detecting known separator characters as such), but it has been removed to prevent confusion. Dunno if we should add it back.

    The main advantage of detecting any invalid characters as thousand separators is to detect currency characters as separators so that they get ignored by the parser.

    For ultimate flexibility, why not allow the user to specify a list of characters to be ignored?

  5. Alister Hood

    For ultimate flexibility, why not allow the user to specify a list of characters to be ignored?

    An empty list by default would avoid people getting wrong answers and not noticing, e.g. as reported in #1027

    (EDIT: well, not empty. It should skip whitespace characters)

  6. Fraser McCrossan

    I note for completeness that this also happens with math operators where it is significantly more confusing. For example:

  7. Log in to comment