Normalize format of numbers pasted from clipboard

Issue #434 closed

Former user created an issue 2013-03-15

Originally reported on Google Code with ID 434

When copy pasting formatted numbers (e.g., with comma separators), either strip commas
or enhance the parser to recognize them.

Reported by haridara on 2013-03-15 06:41:42

Comments (25)

Former user Account Deleted
Reported by helder.pereira.correia on 2013-03-15 20:02:44 - Status changed: Duplicate - Merged into: ~~#58~~
- 2013-03-15T20:02:44+00:00

Former user Account Deleted

I saw issue 58 before doling this one, and these are completely different. This is specifically
about pasting from clipboard, the other one is about formatting the display. Please
reopen this issue.

Reported by haridara on 2013-03-16 05:01:54

2013-03-16T05:01:54+00:00

Former user Account Deleted
```
My bad, I was induced in error by the title.
```
Reported by helder.pereira.correia on 2013-03-16 06:02:23 - Status changed: New - Labels added: Type-Enhancement, Usability, Component-UI
- 2013-03-16T06:02:23+00:00

Former user Account Deleted

I really want to add this feature, but here's the issue: depending on where you come
from, the format can be 1,234.56 or 1.234,56. It's trivial to act smart on these situations,
but what about 1,234 and 1.234? In the latter case, the application needs to know whether
comma and dot are radix characters or digit grouping separators. I think a setting
option of some sort looks inevitable. What do you think? Please help me find a solution
that makes sense for everyone.

Reported by helder.pereira.correia on 2014-02-24 21:26:41

2014-02-24T21:26:41+00:00

Tom G

Could speedcrunch use the system locale to determine how to parse commas? If that seems
overly complicated or error prone, then a simple "Digit grouping character" selector
or even "Ignore the following characters" setting would work for me.

Reported by hutchinsfairy on 2014-06-24 09:09:04

2014-06-24T09:09:04+00:00

Former user Account Deleted
```
Issue 520 has been merged into this issue.
```
Reported by helder.pereira.correia on 2014-08-06 12:05:18
- 2014-08-06T12:05:18+00:00

Former user Account Deleted

I like the idea of getting the radix character and digit grouping separator from the
system locale.  Failing that, as mentioned in issue 520, I think a toggle in the "Settings
-> Behaviour" menu would suffice.

I am totally opposed to leaving things as they are.  Regardless of where a particular
user is from, it won't exactly be a speedy crunch if (s)he has to go back and remove
punctuation characters from pasted numbers before proceeding.

Having a "one size fits all" approach - where both commas and dots are interpreted
as radix characters - is both primitive and misleading.  If someone in the US enters
"1,024" (meaning "1024"), SC will treat it as "1 radix 024".

If the user isn't aware that SC just happens to be coded this way, and doesn't notice
that the answer to their calculation is very different from what it should be, SC will
effectively have made a very significant error.

Also, it'd be handy for financially oriented users if SC would simply ignore currency
symbols (e.g. $), rather than saying "invalid expression".  I mentioned this in issue
520 and was requested to raise the idea again here.

Reported by Xavion.0 on 2014-08-07 08:08:36

2014-08-07T08:08:36+00:00

Former user Account Deleted

Please check pull request 44 for a possible implementation:
https://github.com/speedcrunch/SpeedCrunch/pull/44

In this implementation, thousand separators are detected and ignored from user input
using 2 different methods, one of them allowing the detection of any character that
is neither alphanumeric nor a known operator (so you can write "25 566$ * 65$" for
instance). Also, the user chooses whether both dot and comma are radix characters or
only one of them is a radix character (and the other one is detected as a thousand
separator, so it is ignored).

Reported by teyut@free.fr on 2014-10-26 21:32:05

2014-10-26T21:32:05+00:00

Former user Account Deleted

I obviously think this is a really good patch, and I'd like to see it included as soon
as possible.

I think the "non-SI-compliance" argument can be trumped by highlighting the name of
this application again: SpeedCrunch.

"It won't exactly be a speedy crunch if [users have] to go back and remove punctuation
characters from pasted numbers before proceeding."

I think the "broken exported session" argument can be overcome by including the radix
character preferences in the text file.

Reported by Xavion.0 on 2014-10-29 03:16:53

2014-10-29T03:16:53+00:00

Tey'
Issue ~~#579~~ was marked as a duplicate of this issue.
- 2015-11-27T12:39:50+00:00
Tey'
- changed component to parser
- edited description
- 2015-11-27T12:50:26+00:00
Felix Krull
- changed status to closed
I believe this was resolved by the linked pull request (commit 5aa5846).
- 2016-05-15T13:18:56+00:00
Helder Correia repo owner
@fk I don't believe this is working ATM:
```
1,234.567
= 0.699678

1.234,567
= 0.699678

1,2,3     ? In fact, this is weird.
= 0.36

1.2.3    ? And so is this.
= 0.36
```
@Teyut Edit: I meant the removal of the commas or dots is not working. And besides that, the evaluator is giving some strange values.
- 2016-05-18T10:51:13+00:00
Felix Krull
Is it supposed to remove them? It parses them, IMO that's either sufficient or even preferable to actually normalizing values.

The weirdness is a combination of issue ~~#634~~ (1.2.3 is interpreted as 1.2 * .3) and "Behavior -> Detect All Radix Characters" which is on by default and always causes both '.' and ',' to be parsed as decimal separators. Considering how it breaks expected grouping characters, it should maybe default to off.
- 2016-05-18T11:43:33+00:00
Helder Correia repo owner
Is it supposed to remove them?

No, sorry for the confusion. But we don't parse them correctly either. You're right about the (bad) influence of the Detect All Radix Characters setting. But this is not correct either:
```
1,234.56
= 1234.56

1.234,56
= 1.23456
```
- 2016-05-18T19:17:41+00:00
Felix Krull
Oh, you're saying that the parser should inspect the expression to determine the radix character on an expression-by-expression basis? In that case, you're right, it doesn't do that. That being said, I'm not sure guessing at what the user might have meant is a good idea either.
- 2016-05-18T20:06:41+00:00
Helder Correia repo owner
Thing is, this very issue reported by haridarl is all about guessing the digit grouping automatically. This becomes particularly handy when copy/pasting from apps/web, I've been there, and SpeedCrunch wasn't any speedy indeed. Right now, we look at the radix setting, but this doesn't solve the issue, on the contrary.
- 2016-05-18T20:56:19+00:00
Felix Krull
- changed status to open
- 2016-05-18T21:56:56+00:00
Tey'
But this is not correct either:
```
1,234.56
= 1234.56

1.234,56
= 1.23456
```
I don't understand why. What were the results you expected?

Now if the user mixes both dots and commas in her numbers, she must explicitly tell SpeedCrunch which radix character she wants to use indeed as we have no way to detect it automatically (both radix characters are valid digit group separators as well). It's a user choice. I'm okay with disabling by default.
- 2016-05-19T10:02:52+00:00
Helder Correia repo owner
I don't understand why. What were the results you expected?

The second result should be equivalent to the first. The most common use case is copying/pasting some sort of financial values from the web. These are almost always formatted as 1,234,567.89 or 1.234.567,89. But SC will only swallow one of them, depending on the setting. Thing is, in my particular experience at least, both formats are very common. This makes it very unpractical to keep switching between settings. Also, it is common to use mixed pasted formatting in the same expression (which won't work in any way). Auto-detection of format should not be hard to do at all. At the very least, the formatting could occur in the editor on pasting.
- 2016-05-21T10:23:53+00:00
Tey'
Okay, I understand it better now. I agree differentiating radix characters and digit group separators when there are more than 1 comma and/or more than 1 dot is easy, but how to deal with numbers that contains exactly one dot and one comma in the number (like in your original example)? If we make decision based on the position of the last radix-ish character, then we will end up with false detection. For instance, if we decide 1,234.56 and 1.234,56 should be 1234.56, then 0.314,16 and 0,314.16 will become 314.16 while the intent was to express π. According to Wikipedia, digit group separators may appear both before and after the radix character, but I have no idea whether this is true or not (we barely use them in my country, and when we do, it's generally a space character).

Also, there are the cases where there is (only) one dot/comma in the number which can be understand both as a radix character and a digit group separator (e.g., 1,234 can mean 1.234 as well as 1234 if the user does not set the radix character in use).
- 2016-05-24T12:54:30+00:00
Helder Correia repo owner
You went through the same line of thoughts that I've been through years ago :)

We already offer ISO 31-0 for results, which shows a white space as separator and reserves dot/comma for radix.

It is true that in some countries one can write 1,234 == 1234 or 1.234 == 1234. It is also true that in some countries/specific use cases one can use dot/comma fractional digit grouping. But this type of digit grouping shouldn't be supported by us, since we're not fortune tellers. Instead, we should go for the most obvious and frequent case of 1,234.567890 and 1.234,567890.

This means to consider the last occuring separator as the radix character. This also means to not attempt to guess what 1,234 or 1.234 actually means. They should == 1234/1000 at all times. We'd try to be smart only when there are no doubts. This should beautifully cover the currency use cases and not introduce mistakes. We should not be the McGyver of calculators, but provide the best UX possible.
- 2016-05-25T17:18:37+00:00
Tey'
Okay, I'll change the number parsing in the lexer to reflect that logic then. I'll take this opportunity to change the way numbers from different bases are parsed, because I don't like the way it is now (too much code duplication).
- 2016-05-26T14:04:48+00:00
Helder Correia repo owner
- changed milestone to 0.12
- assigned issue to
  
  Tey'
- 2016-05-26T14:08:24+00:00
Helder Correia repo owner
- changed status to closed
See pull request #76.
- 2016-05-28T19:58:24+00:00
Log in to comment

Assignee: Tey'

Type: enhancement

Priority: minor

Status: closed

Component: parser

Milestone: 0.12

Version: –

Votes: 0

Watchers: 1