Rounding numbers is using decimal places instead of significant digits

Issue #446 open
Robert Leach created an issue

USE CASE: WHAT DO YOU WANT TO DO?

I want to be able to use TreeView with actual raw data instead of having to convert it to be within a restrictive range.

STEPS TO REPRODUCE AN ISSUE (OR TRIGGER A NEW FEATURE)

I created a small matrix to demonstrate this issue:

https://bitbucket.org/TreeView3Dev/treeview3/downloads/really_small_values.txt

Procedure 1:

  1. Open really_small_values.txt
  2. If the colors are not displayed as in the image below, reset preferences or start up a new instance of treeview & repeat step 1 so that the colors get properly set like this:

    good_colors_for_small_vals.png

  3. Select View->Colors...

  4. Try to drag the middle thumb
  5. Double-click the green boundary thumb
  6. Cancel

I can see what's happening in the code, particularly with respect to rounding, and a representative example can be seen in the handling of color thumbs in the color settings. While this is not the specific problem that makes this issue of "critical" importance, it shows how the same code used elsewhere is problematic. The code which these steps demonstrate is inaccurate, is used in the calculation of averages, and might be used elsewhere that could have other ill effects. Here is a second procedure you can use on any regular data file which demonstrates the problem:

Procedure 2:

  1. Open the color settings
  2. Set a thumb to 0.00000000001234
  3. Apply & close settings
  4. Re-open the color settings & edit the same thumb

CURRENT BEHAVIOR

for procedure 1:

  1. After step 3, the boundary values displayed, as a result of what I believe is different rounding code, are 100E-12 and 178.1E-12 as seen in the below image:

    before_color_edit-good.png

  2. Upon attempting step 4, you'll note that you cannot drag the middle thumb anywhere.

  3. After step 5, you see a window whose mean, median, min, center, max, and even the value in the text box are all "0.0", as seen in this screen-capture:

    boundary_thumb_changed.png

  4. Upon canceling in step 6, the right boundary thumb changes to display as "0E0" and the middle thumb is displayed as "NaN", as seen here:

    after_canceling_boundary_thumb_edit.png

for procedure 2:

The code for rounding is used when the thumb is saved and thus the value of the thumb becomes 0.0. That is because everything is rounded to 4 decimal places. Values less than 0.00005 all become 0.0. This rounding code lives in the static Helper class and is used in the calculation of column, row, & region averages and may even be used elsewhere when data is processed.

EXPECTED BEHAVIOR

All of these behaviors, I believe are the result of rounding to "4 decimal places" as opposed to rounding to 4 significant digits.

Rounding should be done with respect to "significant digits" instead of with respect to decimal places. Values such as 0.00000000001234 should be displayed as 1.234E-11 and rounding should not change this value since it already has 4 significant digits (1.234).

In the case where all data in a matrix are values less than 0.00005, all ability to display values with the variation represented in their colors is lost (at least in the averages displayed in the data ticker (as of pull request #90), the values displayed in the color setting thumbs, and possibly elsewhere). And all ability to control the colors is lost.

The color settings and the averages displayed in both the color thumb edit dialog and in the data ticker (as of pull request #90) should be able to handle this use case. Any other places where rounding is done should also be able to handle 4 significant digits and display in scientific notation.

Caveat: if the range of a dataset, for example is from -1 to +1, then displaying "0.0" instead of "1.234E-10" is good, but the value stored (for a color handle) should not be changed from "1.234E-10" to "0.0", as we should not change what a user explicitly & manually sets. In fact, there may even be a case to be made that we should display "1.234E-10" in this case because that's what the user entered. What we should avoid is programmatic generation of a number like "1.234E-10" when the range of the data is -1 to +1.

DEVELOPERS ONLY SECTION

SUGGESTED CHANGE (Pseudocode optional)

Find a method to round which uses the concept of significant digits as opposed to rounding to decimal places.

Take the data range into account when calculating a mean/average so that a range of -1 to +1 would end up with an average of 0.0 instead of 1.234E-10.

When a user enters a thumb value manually, do not change it (despite range considerations).

FILES AFFECTED (where the changes will be implemented) - developers only

Helper.java LabelView.java (rounding code that uses round(val10000)/10000 as of pull request #90) TRView.java (rounding code that uses round(val10000)/10000 as of pull request #90)

LEVEL OF EFFORT - developers only

minor

COMMENTS

Comments (10)

  1. Robert Leach reporter

    @abarysh - I was wondering if you might have an opinion about this issue.

    One possible way of handling this would be to require that the data range (or rather the difference between min and max) be greater than say, 0.0001, or whatever value is necessary to make the current color interface functional. If data is detected that don't meet these requirements, we could issue an error or warning that says that they need to normalize their data to a specific range.

    Another way to handle it, as I've suggested in this issue, is to round values using 4 "significant digits", where 1.234567E-10 would be rounded to 1.235E-10 and handle some edge cases as I described above (e.g. display simpler ("more-rounded") values when programmatic values are generated, such as averages or thumb settings, etc - yet keep manually entered values true to what was explicitly entered).

    I know that it may be unlikely to encounter data with such a small range, but I certainly know that I have encountered tools in the past that forced me to normalize my data to a specific range, which I found to be a hassle.

    What do you think?

  2. Anastasia Baryshnikova

    It sounds to me that storing all data in scientific notation is the way to go. The more general the solution, the better. Users may have a matrix with extremely large values as well.

    Forcing people to have a range of values doesn't sound like a good idea. Someone may have a matrix made only of identical values and NaNs. They still need to be able to look at it and assign a color to the value. When there's no variance in the data, the color settings should not have the middle handle (or it could be there, but not draggable, which would make sense) and the two extreme handles would be assigned the same color because min = max.

  3. Log in to comment