Neural Net Returning a NaN
I loosely trained a neural net for Zr/O on some random atomic structures. When I try to apply that NN to some structures derived from polymorphs of Zr, I often get NaNs for energy.
I’ve attached two similar structures. For the first one the NN works fine, for the other, the NN gives me NaN.
Comments (14)
-
-
reporter Hi Alireza,
The nan’s are occurring in some of the G4 fingerprint calculations for the structure2 I attached. The NN will return a NaN if there is fingerprint that is NaN. I am continuing to investigate.
-
reporter Hi Alireza,
I found the issue!
On lines 113 and 114 of gaussian.f90:
costheta = & dot_product(Rij_vector, Rik_vector) / Rij / Rik term = (1.0d0 + g_gamma * costheta)**g_zeta
costheta can be -1.0000000000000002 due to a rounding error!
If g_gamma = 1.0, you can get a NaN from the exponentiation for term.
Here’s my suggestion for a fix:
costheta = & dot_product(Rij_vector, Rik_vector) / Rij / Rik term = ABS(1.0d0 + g_gamma * costheta)**g_zeta
-
repo owner Should be fixed now: commit 08317ef and the two prior… Thanks for spotting this!
-
repo owner - changed status to resolved
-
repo owner - changed status to open
@Michael Waters wrote to the amp-users list:
Hi Andrew,
Most of my NaNs are gone. The remaining NaNs are fixed by ensuring that costheta can't be larger than 1, like this:
if (costheta < -1.0d0) then costheta = -1.0d0 end if if (costheta > 1.0d0) then costheta = 1.0d0 end if
Maybe Fortran model version 13 was unlucky?
Best, -Mike
-
repo owner Can you send a system that shows the problem? You are the only one I have heard of this problem from, so we need a system that duplicates the error…. I think the other system must have only encountered the “-1” region. Also, it would be much appreciated if you could make the system minimal. We do our debugging in the pure-python version of Amp, and your previous structure was quite large, making that process a bit cumbersome. If you could search through your problematic structure to find which atom combination is causing the problem, you can just extract those atoms and make a new smaller system that hopefully still has the problem. Ok?
P.S. I’m struggling to figure out why you would ever encounter costheta = 1.0. Doesn’t this mean that theta is 0? How would this happen other than having two atoms in the same place?
-
reporter I think I know, the spacing looks like this center----> atom1 ----> atom2.
How should I send you my files?
-
repo owner Oh right, they are in a line on the same side of the atom; if the cutoff radius is big enough this can occur. It's early in the morning in Copenhagen at the moment and my caffeine hadn't kicked in!
You can just upload a trajectory to this issue page, like you did when you reported the issue originally.
-
reporter - attached beta-Zr-NaN-test.traj
-
reporter - attached Zr-O.amp
-
reporter Do you need anything else?
-
reporter Oh some info might help. This is a scan of energy-volume for BCC Zr. The first 3 images should give NaNs.
-
repo owner I can’t open the trajectory file you attached. My error is below. I’m using the latest version of ASE. Does it open correctly on your end? Perhaps you can re-upload it and maybe save it as an extxyz file as backup, since that’s plain text.
$ ase -T gui beta-Zr-NaN-test.traj Traceback (most recent call last): File "/home/aap/Dropbox/repositories/ase/bin/ase", line 3, in <module> main() File "/home/aap/Dropbox/repositories/ase/ase/cli/main.py", line 99, in main f(args) File "/home/aap/Dropbox/repositories/ase/ase/gui/ag.py", line 68, in run images.read(args.filenames, args.image_number) File "/home/aap/Dropbox/repositories/ase/ase/gui/images.py", line 182, in read self.initialize(images, names) File "/home/aap/Dropbox/repositories/ase/ase/gui/images.py", line 125, in initialize self.maxnatoms = max(len(atoms) for atoms in self) ValueError: max() arg is an empty sequence
- Log in to comment
Mike, could you put a couple print statements in the source code (say here) to see what exactly gets NaN in the code? Is it fingerprints (the input of the model) or the output of the model?