Bug/bad docstrings in gaussian.py for calculate_G2

Issue #159 new
John Kitchin created an issue

I think there are some bugs in this code. On the Python side there is

Rij = np.linalg.norm(Rj - Ri)

However, Ri is not the position of the centered atom, it is an index of the atom, so it is an integer.

It should be

Rij = np.linalg.norm(Rj - neighborpositions[Ri])

with the way the code is currently documented.

Also the docstring suggests that this:

from amp.descriptor.gaussian import calculate_G2

print(calculate_G2(neighborsymbols=['O', 'O'],
                   neighborpositions=[[0, 0, 0],
                                      [0, 0, 1.2]],
                   G_element={'O'},
                   eta=10,
                   cutoff=6,
                   Ri=0,
                   fortran=False))

should work, but it fails with

Traceback (most recent call last):
  File "<stdin>", line 10, in <module>
  File "/Users/jkitchin/vc/projects/neural-network/amp/amp/descriptor/gaussian.py", line 586, in calculate_G2
    Rc = cutoff['kwargs']['Rc']
TypeError: 'int' object is not subscriptable

it looks like the code expects that cutoff is actually something like Cosine(6.5).todict().

If I set Fortran=True, I get a different set of errors:

Traceback (most recent call last):
  File "<stdin>", line 10, in <module>
  File "/Users/jkitchin/vc/projects/neural-network/amp/amp/descriptor/gaussian.py", line 561, in calculate_G2
    G_number = [atomic_numbers[G_element]]
TypeError: unhashable type: 'set'

That suggests that G_element should be a string that is a chemical symbol, not a dictionary.

also, there is cutofffn = cutoff['name'], which suggests cutoff should be a dictionary, and not a float.

Also, with fortran True, Ri should be a position and not an integer.

Finally, is np.exp(-eta * (Rij 2.) / (Rc 2.)) correct? I have not seen Rij normalized by the cutoff radius in any equations in the literature. Is that done somewhere? Edit: I see this is the form used in eq 6 in A. Khorshidi, A.A. Peterson / Computer Physics Communications 207 (2016) 310–324. Still, it looks like it just modifies the eta parameter compared to the Behler notation.

Ok last edit: It looks like the loop over atoms counts the case where i=j. This means the G2 function is greater than it should be by one (where Rij=0).

Comments (5)

  1. Alireza Khorshidi

    @jkitchin, I am sorry, seems that we haven't done at all a good job in docstrings; in many places they have not been updated alongside with the changes in the code.

    @andrewpeterson and @muammar: I tried to fix the issues in documentation pointed out by John. However, I guess we have an immediate need to reconsider the docstrings of all of the codes and update them. Our bad docstrings particularly highlight when the user wants to use just part of the code. Should I open a separate issue for updating docstrings?

    Getting back to @jkitchin's questions, "Ri" here is position (a list of three floats) not index. The documentation is now corrected.

    The correct way of your piece of code is

    from amp.descriptor.gaussian import calculate_G2
    from amp.descriptor.cutoffs import Cosine
    
    fp = calculate_G2(neighborsymbols=['O'],
                       neighborpositions=[[0, 0, 1.246],],
                       G_element='O',
                       eta=1,
                       cutoff=Cosine(6.5).todict(),
                       Ri=[0, 0, 0],
                       fortran=True)
    print(fp)
    

    either with fortran=False or fortran=True. The above script calculates G2 for atom at position zero, the neighborpositions is just a single atom (in the above two-atom system) and the self-atom is not considered as neighbor. The above piece of code should give fp as 0.8791347761257509. If we now do the same with the code itself by something like:

    # Make a series of images.
    from ase.structure import molecule
    from ase.visualize import view
    atoms = molecule('O2')
    view(atoms)
    atoms.set_pbc(False)
    images = [atoms]
    
    # Fingerprint using Amp.
    from amp.descriptor.gaussian import Gaussian
    descriptor = Gaussian(Gs = {'O': [{'type': 'G2', 'element': 'O', 'eta': 1.0}]})
    
    # First approach for hashing images.
    from amp.utilities import hash_images
    images = hash_images(images)
    descriptor.calculate_fingerprints(hash_images(images))
    
    descriptor.neighborlist.open()
    keys = descriptor.neighborlist.d.keys()
    print "neighborlist =", descriptor.neighborlist.d[keys[0]]
    
    descriptor.fingerprints.open()
    keys = descriptor.fingerprints.d.keys()
    print "fingerprints =", descriptor.fingerprints.d[keys[0]]
    

    it should give

    neighborlist = [(array([1]), array([[0, 0, 0]])), (array([0]), array([[0, 0, 0]]))]
    
    fingerprints = [('O', [0.8791428635947317]), ('O', [0.8791428635947317])]
    

    which is the same number as above, meaning that the code also does not consider self-atom as its neighbor.

    Correct, we also did not see any division by Rc^2 in the literature. The default values of \eta here is about (6.5)^2 multiplied by the values of \eta in Behler. However, if one wants to choose a different value of cutoff, the \eta values should also be adjusted in the Behler approach, whereas here the \eta values do not need to be adjusted (please note that here \eta has no dimension, whereas in Behler it has the dimension of 1/(length**2)). Does that make sense to you @jkitchin?

    BTW, in the above calculate_G2 method, we should probably change the cutoff input argument such that it does not need a todict() call.

  2. John Kitchin reporter

    Thanks for the clarifications. They all make sense. It makes sense the neighborpositions should not have the atom itself in it! Thanks for updating the docstrings for this!

  3. Log in to comment