Neural Net Returning a NaN

Issue #226 open

Michael Waters created an issue 2019-05-15

I loosely trained a neural net for Zr/O on some random atomic structures. When I try to apply that NN to some structures derived from polymorphs of Zr, I often get NaNs for energy.

I’ve attached two similar structures. For the first one the NN works fine, for the other, the NN gives me NaN.

Comments (14)

Alireza Khorshidi
Mike, could you put a couple print statements in the source code (say here) to see what exactly gets NaN in the code? Is it fingerprints (the input of the model) or the output of the model?
- 2019-05-16T00:44:51+00:00
Michael Waters reporter
Hi Alireza,

The nan’s are occurring in some of the G4 fingerprint calculations for the structure2 I attached. The NN will return a NaN if there is fingerprint that is NaN. I am continuing to investigate.
- 2019-05-18T00:14:49+00:00

Michael Waters reporter

Hi Alireza,

I found the issue!

On lines 113 and 114 of gaussian.f90:

                costheta = &
                dot_product(Rij_vector, Rik_vector) / Rij / Rik
                term = (1.0d0 + g_gamma * costheta)**g_zeta

costheta can be -1.0000000000000002 due to a rounding error!

If g_gamma = 1.0, you can get a NaN from the exponentiation for term.

Here’s my suggestion for a fix:

‌

                costheta = &
                dot_product(Rij_vector, Rik_vector) / Rij / Rik
                term = ABS(1.0d0 + g_gamma * costheta)**g_zeta

2019-05-18T02:34:16+00:00

andrew_peterson repo owner
Should be fixed now: commit 08317ef and the two prior… Thanks for spotting this!
- 2020-04-08T16:10:07+00:00
andrew_peterson repo owner
- changed status to resolved
- 2020-04-08T16:10:17+00:00

andrew_peterson repo owner

changed status to open

@Michael Waters wrote to the amp-users list:

Hi Andrew,

Most of my NaNs are gone. The remaining NaNs are fixed by ensuring that costheta can't be larger than 1, like this:

                if (costheta < -1.0d0) then
                    costheta = -1.0d0
               end if
                if (costheta >  1.0d0) then
                    costheta =  1.0d0
                end if

Maybe Fortran model version 13 was unlucky?

Best, -Mike

2020-04-09T05:29:17+00:00

andrew_peterson repo owner
Can you send a system that shows the problem? You are the only one I have heard of this problem from, so we need a system that duplicates the error…. I think the other system must have only encountered the “-1” region. Also, it would be much appreciated if you could make the system minimal. We do our debugging in the pure-python version of Amp, and your previous structure was quite large, making that process a bit cumbersome. If you could search through your problematic structure to find which atom combination is causing the problem, you can just extract those atoms and make a new smaller system that hopefully still has the problem. Ok?

P.S. I’m struggling to figure out why you would ever encounter costheta = 1.0. Doesn’t this mean that theta is 0? How would this happen other than having two atoms in the same place?
- 2020-04-09T05:34:37+00:00
Michael Waters reporter
I think I know, the spacing looks like this center----> atom1 ----> atom2.

How should I send you my files?
- 2020-04-09T06:22:20+00:00
andrew_peterson repo owner
Oh right, they are in a line on the same side of the atom; if the cutoff radius is big enough this can occur. It's early in the morning in Copenhagen at the moment and my caffeine hadn't kicked in!

You can just upload a trajectory to this issue page, like you did when you reported the issue originally.
- 2020-04-09T06:35:18+00:00
Michael Waters reporter
- attached beta-Zr-NaN-test.traj
- 2020-04-09T15:54:18+00:00
Michael Waters reporter
- attached Zr-O.amp
- 2020-04-09T15:54:38+00:00
Michael Waters reporter
Do you need anything else?
- 2020-04-09T15:54:52+00:00
Michael Waters reporter
Oh some info might help. This is a scan of energy-volume for BCC Zr. The first 3 images should give NaNs.
- 2020-04-09T16:07:59+00:00

andrew_peterson repo owner

I can’t open the trajectory file you attached. My error is below. I’m using the latest version of ASE. Does it open correctly on your end? Perhaps you can re-upload it and maybe save it as an extxyz file as backup, since that’s plain text.

$ ase -T gui beta-Zr-NaN-test.traj 
Traceback (most recent call last):
  File "/home/aap/Dropbox/repositories/ase/bin/ase", line 3, in <module>
    main()
  File "/home/aap/Dropbox/repositories/ase/ase/cli/main.py", line 99, in main
    f(args)
  File "/home/aap/Dropbox/repositories/ase/ase/gui/ag.py", line 68, in run
    images.read(args.filenames, args.image_number)
  File "/home/aap/Dropbox/repositories/ase/ase/gui/images.py", line 182, in read
    self.initialize(images, names)
  File "/home/aap/Dropbox/repositories/ase/ase/gui/images.py", line 125, in initialize
    self.maxnatoms = max(len(atoms) for atoms in self)
ValueError: max() arg is an empty sequence

‌

2020-04-11T08:32:22+00:00

Assignee: –

Type: bug

Priority: critical

Status: open

Votes: 0

Watchers: 1