Precision loss in Parameters()?

Issue #73 new
Alireza Khorshidi created an issue

It seems that saving fprange in ASE Parameters dictionary causes the precision to be lost. This seems to be the reason for loosening the assertion threshold in the commit a0fd8de.

Similar precision loss might happen for weights and scalings.

Comments (12)

  1. andrew_peterson repo owner

    I guess this is when it writes and reads to the file? I see it could happen here in the ASE source code; if not enough digits are written. Perhaps we need to suggest a change to the ASE source code for this line?

  2. Alireza Khorshidi reporter

    Attached are saved parameters in versions v0.4 and v0.5.

    In v0.4, we were saving "weights" and "biases" as "variables" in the form of lists, and "fingerprints_range" in the form of list of lists, which seem to be keeping the precision.

    However, in v0.5, we are saving things as numpy array, which seems to be the reason why precision is lost. I am trying to find a way to keep precision for numpy arrays, else we may want to save things as lists.

  3. andrew_peterson repo owner

    I see your point. Their is a tolist method of numpy arrays; this in principle could solve our problem. See below. We'd have to set up some sort of type-checking in a re-written Parameters class, I guess, or else just require that all variables be saved internally only as lists (or lists of lists), and converted on-the-fly to numpy arrays as needed. But seems like a solvable problem.

    >>> import numpy as np
    >>> a = np.array([[3423.23423423049234, 324], [234, 2340203.2222]])
    >>> '%r' % a
    'array([[  3.42323423e+03,   3.24000000e+02],\n       [  2.34000000e+02,   2.34020322e+06]])'
    >>> '%r' % a.tolist()
    '[[3423.2342342304923, 324.0], [234.0, 2340203.2222]]'
    
  4. andrew_peterson repo owner

    This seems like an easy mistake to re-make. That is, I suspect in a few months an np.array may creep back into a parameter dictionary somewhere. How do you think we can remember this tip?

  5. Alireza Khorshidi reporter

    Should we have a re-written Parameters class, where the existence of np.array is checked, and a warning is raised?

  6. Alireza Khorshidi reporter

    Are we losing the precision of loss function derivatives here in version v0.4, or here in version v0.5?

    If we use np.array, what is printed out is different from if we do not use np.array, but it gives zero for the difference between the two cases! I am confused!

    >>> dloss_dparameters = [0.123456789123456789]
    >>> dloss_dparameters
    [0.12345678912345678]
    >>> import numpy as np
    >>> np.array(dloss_dparameters)
    array([ 0.12345679])
    >>> dloss_dparameters - np.array(dloss_dparameters)
    array([ 0.])
    >>> '%r' % dloss_dparameters
    '[0.12345678912345678]'
    >>> '%r' % np.array(dloss_dparameters)
    'array([ 0.12345679])'
    
  7. Alireza Khorshidi reporter

    Maybe it is just losing the precision when the number is printed, but not during calculations:

    >>> import numpy as np
    >>> a = np.array([0.12345678912])
    >>> b = np.array([0.12345678914])
    >>> a-b
    array([ -2.00000017e-11])
    >>> a
    array([ 0.12345679])
    >>> b
    array([ 0.12345679])
    
  8. andrew_peterson repo owner

    Yes, I think it is just an issue with the repr function of numpy arrays giving a fixed number of digits.

  9. andrew_peterson repo owner

    Note this is also closely related to Issue #103. Note that this is a general issue. Numbers to the computer are stored in binary, so if they are converted to human-readable base10 there is a small bit of rounding. Then when they are converted back to binary there is more rounding. A good solution is to keep them in base10 (or compatible, like hex, which can be seen with float.hex).

    Numpy specifically shows even fewer significant digits on printing in human-readable format. But this can be changed with something like np.set_printoptions(precision=30).

    It turns out if you pickle an object (for interprocess communication), it turns your object into a string of ASCII (if using protocol 0), so we can open it and look at it. From this, we can see that floats are pickled as base10 human readable numbers, so presumably lose some precision to rounding. However, numpy arrays are not human readable upon pickling, so presumably do not.

  10. Log in to comment