Bug/bad docstrings in gaussian.py for calculate_G2

Alireza Khorshidi

@jkitchin, I am sorry, seems that we haven't done at all a good job in docstrings; in many places they have not been updated alongside with the changes in the code.

@andrewpeterson and @muammar: I tried to fix the issues in documentation pointed out by John. However, I guess we have an immediate need to reconsider the docstrings of all of the codes and update them. Our bad docstrings particularly highlight when the user wants to use just part of the code. Should I open a separate issue for updating docstrings?

Getting back to @jkitchin's questions, "Ri" here is position (a list of three floats) not index. The documentation is now corrected.

The correct way of your piece of code is

from amp.descriptor.gaussian import calculate_G2
from amp.descriptor.cutoffs import Cosine

fp = calculate_G2(neighborsymbols=['O'],
                   neighborpositions=[[0, 0, 1.246],],
                   G_element='O',
                   eta=1,
                   cutoff=Cosine(6.5).todict(),
                   Ri=[0, 0, 0],
                   fortran=True)
print(fp)

either with fortran=False or fortran=True. The above script calculates G2 for atom at position zero, the neighborpositions is just a single atom (in the above two-atom system) and the self-atom is not considered as neighbor. The above piece of code should give fp as 0.8791347761257509. If we now do the same with the code itself by something like:

# Make a series of images.
from ase.structure import molecule
from ase.visualize import view
atoms = molecule('O2')
view(atoms)
atoms.set_pbc(False)
images = [atoms]

# Fingerprint using Amp.
from amp.descriptor.gaussian import Gaussian
descriptor = Gaussian(Gs = {'O': [{'type': 'G2', 'element': 'O', 'eta': 1.0}]})

# First approach for hashing images.
from amp.utilities import hash_images
images = hash_images(images)
descriptor.calculate_fingerprints(hash_images(images))

descriptor.neighborlist.open()
keys = descriptor.neighborlist.d.keys()
print "neighborlist =", descriptor.neighborlist.d[keys[0]]

descriptor.fingerprints.open()
keys = descriptor.fingerprints.d.keys()
print "fingerprints =", descriptor.fingerprints.d[keys[0]]

it should give

neighborlist = [(array([1]), array([[0, 0, 0]])), (array([0]), array([[0, 0, 0]]))]

fingerprints = [('O', [0.8791428635947317]), ('O', [0.8791428635947317])]

which is the same number as above, meaning that the code also does not consider self-atom as its neighbor.

Correct, we also did not see any division by Rc^2 in the literature. The default values of \eta here is about (6.5)^2 multiplied by the values of \eta in Behler. However, if one wants to choose a different value of cutoff, the \eta values should also be adjusted in the Behler approach, whereas here the \eta values do not need to be adjusted (please note that here \eta has no dimension, whereas in Behler it has the dimension of 1/(length**2)). Does that make sense to you @jkitchin?

BTW, in the above calculate_G2 method, we should probably change the cutoff input argument such that it does not need a todict() call.

2017-03-19T23:41:46+00:00

Comments (5)

John Kitchin reporter
- edited description
- 2017-03-18T15:45:01+00:00
John Kitchin reporter
- edited description
- 2017-03-18T16:13:10+00:00
John Kitchin reporter
- edited description
- 2017-03-18T16:46:56+00:00
Alireza Khorshidi
@jkitchin, I am sorry, seems that we haven't done at all a good job in docstrings; in many places they have not been updated alongside with the changes in the code.

@andrewpeterson and @muammar: I tried to fix the issues in documentation pointed out by John. However, I guess we have an immediate need to reconsider the docstrings of all of the codes and update them. Our bad docstrings particularly highlight when the user wants to use just part of the code. Should I open a separate issue for updating docstrings?

Getting back to @jkitchin's questions, "Ri" here is position (a list of three floats) not index. The documentation is now corrected.

The correct way of your piece of code is
```
from amp.descriptor.gaussian import calculate_G2
from amp.descriptor.cutoffs import Cosine

fp = calculate_G2(neighborsymbols=['O'],
                   neighborpositions=[[0, 0, 1.246],],
                   G_element='O',
                   eta=1,
                   cutoff=Cosine(6.5).todict(),
                   Ri=[0, 0, 0],
                   fortran=True)
print(fp)
```
either with fortran=False or fortran=True. The above script calculates G2 for atom at position zero, the neighborpositions is just a single atom (in the above two-atom system) and the self-atom is not considered as neighbor. The above piece of code should give fp as 0.8791347761257509. If we now do the same with the code itself by something like:
```
# Make a series of images.
from ase.structure import molecule
from ase.visualize import view
atoms = molecule('O2')
view(atoms)
atoms.set_pbc(False)
images = [atoms]

# Fingerprint using Amp.
from amp.descriptor.gaussian import Gaussian
descriptor = Gaussian(Gs = {'O': [{'type': 'G2', 'element': 'O', 'eta': 1.0}]})

# First approach for hashing images.
from amp.utilities import hash_images
images = hash_images(images)
descriptor.calculate_fingerprints(hash_images(images))

descriptor.neighborlist.open()
keys = descriptor.neighborlist.d.keys()
print "neighborlist =", descriptor.neighborlist.d[keys[0]]

descriptor.fingerprints.open()
keys = descriptor.fingerprints.d.keys()
print "fingerprints =", descriptor.fingerprints.d[keys[0]]
```
it should give
```
neighborlist = [(array([1]), array([[0, 0, 0]])), (array([0]), array([[0, 0, 0]]))]
```
```
fingerprints = [('O', [0.8791428635947317]), ('O', [0.8791428635947317])]
```
which is the same number as above, meaning that the code also does not consider self-atom as its neighbor.

Correct, we also did not see any division by Rc^2 in the literature. The default values of \eta here is about (6.5)^2 multiplied by the values of \eta in Behler. However, if one wants to choose a different value of cutoff, the \eta values should also be adjusted in the Behler approach, whereas here the \eta values do not need to be adjusted (please note that here \eta has no dimension, whereas in Behler it has the dimension of 1/(length**2)). Does that make sense to you @jkitchin?

BTW, in the above calculate_G2 method, we should probably change the cutoff input argument such that it does not need a todict() call.
- 2017-03-19T23:41:46+00:00
John Kitchin reporter
Thanks for the clarifications. They all make sense. It makes sense the neighborpositions should not have the atom itself in it! Thanks for updating the docstrings for this!
- 2017-03-20T15:03:31+00:00
Log in to comment