heldercorreia / speedcrunch / issues / #821 - Expression with illegal unicode chars shall be rejected. — Bitbucket

Issue #821 new

Former user created an issue 2018-04-11

When expression contains unicode charactor, like "12天2", the evaluator automatically skips the unicode and produces an unexpected number "122".

It will be very confusing when some similar unicode is filtered out. E.g. "12/1，2" will result "1"

It would be better to reject such kind of expression, so that user can notice the issue.

Comments (12)

Tey'
These characters are detected as being thousand separators, that's why they are silently ignored. There used to be an option to make the detection more strict (that is, only detecting known separator characters as such), but it has been removed to prevent confusion. Dunno if we should add it back.

The main advantage of detecting any invalid characters as thousand separators is to detect currency characters as separators so that they get ignored by the parser.
- 2018-04-11T13:28:59+00:00
Helder Correia repo owner
@teyut Maybe we should "swallow" just a set of well-defined characters that makes sense: commas, dots, spaces, apostrophes, etc, and yield an error with everything else.
- 2018-04-11T13:36:10+00:00
Tey'
It's okay for the digit group separators, but not for currency symbols for instance. I still like the possibility to have an option to choose between the 2 modes.

We might also only allow known digit group separators between digits, but allow anything else at the end or start of a number.
- 2018-04-11T15:12:51+00:00
Helder Correia repo owner
@teyut I agree with everything.
- 2018-04-11T17:53:21+00:00
Helder Correia repo owner
Issue ~~#845~~ was marked as a duplicate of this issue.
- 2018-07-20T17:44:06+00:00
Tey'
Issue ~~#892~~ was marked as a duplicate of this issue.
- 2019-03-01T15:04:33+00:00
Tey'
- assigned issue to
  
  Tey'
- 2019-03-01T15:09:15+00:00
Helder Correia repo owner
- changed milestone to 1.0
- 2019-03-01T15:46:19+00:00
Alister Hood
These characters are detected as being thousand separators, that's why they are silently ignored. There used to be an option to make the detection more strict (that is, only detecting known separator characters as such), but it has been removed to prevent confusion. Dunno if we should add it back.

The main advantage of detecting any invalid characters as thousand separators is to detect currency characters as separators so that they get ignored by the parser.

For ultimate flexibility, why not allow the user to specify a list of characters to be ignored?
- 2021-11-04T05:23:55+00:00
Alister Hood
For ultimate flexibility, why not allow the user to specify a list of characters to be ignored?

An empty list by default would avoid people getting wrong answers and not noticing, e.g. as reported in #1027…

(EDIT: well, not empty. It should skip whitespace characters)
- 2021-11-04T05:27:52+00:00
Sadiq
the tool should show something like this

I m going to make a new branch on this Week end

‌
- 2021-11-04T06:31:33+00:00
Fraser McCrossan
I note for completeness that this also happens with math operators where it is significantly more confusing. For example:

‌
- 2023-02-16T02:16:12+00:00
Log in to comment

Assignee: Tey'

Type: bug

Priority: critical

Status: new

Component: parser

Milestone: 1.0

Version: 0.12

Votes: 1

Watchers: 6

Jira: the preferred issue tracker for Bitbucket. Join the team!