Normalize format of numbers pasted from clipboard
Originally reported on Google Code with ID 434
When copy pasting formatted numbers (e.g., with comma separators), either strip commas
or enhance the parser to recognize them.
Reported by haridara
on 2013-03-15 06:41:42
Comments (25)
-
Account Deleted -
Account Deleted I saw issue 58 before doling this one, and these are completely different. This is specifically about pasting from clipboard, the other one is about formatting the display. Please reopen this issue.
Reported by
haridara
on 2013-03-16 05:01:54 -
Account Deleted My bad, I was induced in error by the title.
Reported by
helder.pereira.correia
on 2013-03-16 06:02:23 - Status changed:New
- Labels added: Type-Enhancement, Usability, Component-UI -
Account Deleted I really want to add this feature, but here's the issue: depending on where you come from, the format can be 1,234.56 or 1.234,56. It's trivial to act smart on these situations, but what about 1,234 and 1.234? In the latter case, the application needs to know whether comma and dot are radix characters or digit grouping separators. I think a setting option of some sort looks inevitable. What do you think? Please help me find a solution that makes sense for everyone.
Reported by
helder.pereira.correia
on 2014-02-24 21:26:41 -
Could speedcrunch use the system locale to determine how to parse commas? If that seems overly complicated or error prone, then a simple "Digit grouping character" selector or even "Ignore the following characters" setting would work for me.
Reported by
hutchinsfairy
on 2014-06-24 09:09:04 -
Account Deleted Issue 520 has been merged into this issue.
Reported by
helder.pereira.correia
on 2014-08-06 12:05:18 -
Account Deleted I like the idea of getting the radix character and digit grouping separator from the system locale. Failing that, as mentioned in issue 520, I think a toggle in the "Settings -> Behaviour" menu would suffice. I am totally opposed to leaving things as they are. Regardless of where a particular user is from, it won't exactly be a speedy crunch if (s)he has to go back and remove punctuation characters from pasted numbers before proceeding. Having a "one size fits all" approach - where both commas and dots are interpreted as radix characters - is both primitive and misleading. If someone in the US enters "1,024" (meaning "1024"), SC will treat it as "1 radix 024". If the user isn't aware that SC just happens to be coded this way, and doesn't notice that the answer to their calculation is very different from what it should be, SC will effectively have made a very significant error. Also, it'd be handy for financially oriented users if SC would simply ignore currency symbols (e.g. $), rather than saying "invalid expression". I mentioned this in issue 520 and was requested to raise the idea again here.
Reported by
Xavion.0
on 2014-08-07 08:08:36 -
Account Deleted Please check pull request 44 for a possible implementation: https://github.com/speedcrunch/SpeedCrunch/pull/44 In this implementation, thousand separators are detected and ignored from user input using 2 different methods, one of them allowing the detection of any character that is neither alphanumeric nor a known operator (so you can write "25 566$ * 65$" for instance). Also, the user chooses whether both dot and comma are radix characters or only one of them is a radix character (and the other one is detected as a thousand separator, so it is ignored).
Reported by
teyut@free.fr
on 2014-10-26 21:32:05 -
Account Deleted I obviously think this is a really good patch, and I'd like to see it included as soon as possible. I think the "non-SI-compliance" argument can be trumped by highlighting the name of this application again: SpeedCrunch. "It won't exactly be a speedy crunch if [users have] to go back and remove punctuation characters from pasted numbers before proceeding." I think the "broken exported session" argument can be overcome by including the radix character preferences in the text file.
Reported by
Xavion.0
on 2014-10-29 03:16:53 -
Issue
#579was marked as a duplicate of this issue. -
- changed component to parser
- edited description
-
- changed status to closed
I believe this was resolved by the linked pull request (commit 5aa5846).
-
repo owner @fk I don't believe this is working ATM:
1,234.567 = 0.699678 1.234,567 = 0.699678 1,2,3 ? In fact, this is weird. = 0.36 1.2.3 ? And so is this. = 0.36
@Teyut Edit: I meant the removal of the commas or dots is not working. And besides that, the evaluator is giving some strange values.
-
Is it supposed to remove them? It parses them, IMO that's either sufficient or even preferable to actually normalizing values.
The weirdness is a combination of issue
#634(1.2.3
is interpreted as1.2 * .3
) and "Behavior -> Detect All Radix Characters" which is on by default and always causes both '.' and ',' to be parsed as decimal separators. Considering how it breaks expected grouping characters, it should maybe default to off. -
repo owner Is it supposed to remove them?
No, sorry for the confusion. But we don't parse them correctly either. You're right about the (bad) influence of the Detect All Radix Characters setting. But this is not correct either:
1,234.56 = 1234.56 1.234,56 = 1.23456
-
Oh, you're saying that the parser should inspect the expression to determine the radix character on an expression-by-expression basis? In that case, you're right, it doesn't do that. That being said, I'm not sure guessing at what the user might have meant is a good idea either.
-
repo owner Thing is, this very issue reported by haridarl is all about guessing the digit grouping automatically. This becomes particularly handy when copy/pasting from apps/web, I've been there, and SpeedCrunch wasn't any speedy indeed. Right now, we look at the radix setting, but this doesn't solve the issue, on the contrary.
-
- changed status to open
-
But this is not correct either:
1,234.56 = 1234.56 1.234,56 = 1.23456
I don't understand why. What were the results you expected?
Now if the user mixes both dots and commas in her numbers, she must explicitly tell SpeedCrunch which radix character she wants to use indeed as we have no way to detect it automatically (both radix characters are valid digit group separators as well). It's a user choice. I'm okay with disabling by default.
-
repo owner I don't understand why. What were the results you expected?
The second result should be equivalent to the first. The most common use case is copying/pasting some sort of financial values from the web. These are almost always formatted as 1,234,567.89 or 1.234.567,89. But SC will only swallow one of them, depending on the setting. Thing is, in my particular experience at least, both formats are very common. This makes it very unpractical to keep switching between settings. Also, it is common to use mixed pasted formatting in the same expression (which won't work in any way). Auto-detection of format should not be hard to do at all. At the very least, the formatting could occur in the editor on pasting.
-
Okay, I understand it better now. I agree differentiating radix characters and digit group separators when there are more than 1 comma and/or more than 1 dot is easy, but how to deal with numbers that contains exactly one dot and one comma in the number (like in your original example)? If we make decision based on the position of the last radix-ish character, then we will end up with false detection. For instance, if we decide
1,234.56
and1.234,56
should be1234.56
, then0.314,16
and0,314.16
will become314.16
while the intent was to express π. According to Wikipedia, digit group separators may appear both before and after the radix character, but I have no idea whether this is true or not (we barely use them in my country, and when we do, it's generally a space character).Also, there are the cases where there is (only) one dot/comma in the number which can be understand both as a radix character and a digit group separator (e.g.,
1,234
can mean1.234
as well as1234
if the user does not set the radix character in use). -
repo owner You went through the same line of thoughts that I've been through years ago :)
We already offer ISO 31-0 for results, which shows a white space as separator and reserves dot/comma for radix.
It is true that in some countries one can write
1,234
==1234
or1.234
==1234
. It is also true that in some countries/specific use cases one can use dot/comma fractional digit grouping. But this type of digit grouping shouldn't be supported by us, since we're not fortune tellers. Instead, we should go for the most obvious and frequent case of1,234.567890
and1.234,567890
.This means to consider the last occuring separator as the radix character. This also means to not attempt to guess what
1,234
or1.234
actually means. They should ==1234/1000
at all times. We'd try to be smart only when there are no doubts. This should beautifully cover the currency use cases and not introduce mistakes. We should not be the McGyver of calculators, but provide the best UX possible. -
Okay, I'll change the number parsing in the lexer to reflect that logic then. I'll take this opportunity to change the way numbers from different bases are parsed, because I don't like the way it is now (too much code duplication).
-
repo owner -
repo owner - changed status to closed
See pull request #76.
- Log in to comment
Reported by
helder.pereira.correia
on 2013-03-15 20:02:44 - Status changed:Duplicate
- Merged into:#58