Bug or bad docstring in gaussian.py?

Issue #158 new
John Kitchin created an issue

I tried to do this:

from amp.descriptor.gaussian import Gaussian

g = Gaussian(cutoff=6.5,
             Gs={"O": [{"type":"G2", "element":"O", "eta":10.}]})


from ase import Atoms

atoms = Atoms('O2', [[0, 0, 0], [1.2, 0, 0]])

print(g.calculate_fingerprints([atoms]))

But it gives me this error:

Traceback (most recent call last):
  File "<stdin>", line 11, in <module>
  File "/Users/jkitchin/vc/projects/neural-network/amp/amp/descriptor/gaussian.py", line 144, in calculate_fingerprints
    p.elements = set([atom.symbol for atoms in images.values()
AttributeError: 'list' object has no attribute 'values'

That looks like the code expected images to be a dictionary, and I guess it is a dictionary of {hash: atoms}? The docstring says it should be a list of atoms, or a traj/db filename.

Making it a list, however leads to

Traceback (most recent call last):
  File "<stdin>", line 11, in <module>
  File "/Users/jkitchin/vc/projects/neural-network/amp/amp/descriptor/gaussian.py", line 163, in calculate_fingerprints
    self.neighborlist.calculate_items(images, parallel=parallel, log=log)
  File "/Users/jkitchin/vc/projects/neural-network/amp/amp/utilities.py", line 404, in calculate_items
    calcs_needed = list(set(images.keys()).difference(d.keys()))
AttributeError: 'list' object has no attribute 'keys'

Comments (9)

  1. Muammar El Khatib

    I confirm I could reproduce this report. I think it is an error in the docstring. Your guess is correct. Passing atoms as a dictionary of {hash: atoms} makes the code run:

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    #
    
    from amp.descriptor.gaussian import Gaussian
    from amp.utilities import hash_image
    
    g = Gaussian(cutoff=6.5, Gs={"O": [{"type":"G2", "element":"O", "eta":10.}]})
    
    
    from ase import Atoms
    
    atoms = Atoms('O2', [[0, 0, 0], [1.2, 0, 0]])
    
    _atoms = hash_image(atoms)
    
    image = {_atoms: atoms}
    
    print(g.calculate_fingerprints(image))
    

    And it generates the following directories:

    amp-data-fingerprints.ampdb
    └── loose
        └── 43d3aa12625c34f09d9c5a957ce7453e
    
    1 directory, 1 file
    
    amp-data-neighborlists.ampdb
    └── loose
        └── 43d3aa12625c34f09d9c5a957ce7453e
    
    1 directory, 1 file
    
  2. Alireza Khorshidi

    I guess there are two ways to do that, called as "1st approach" and "2nd approach" in the following piece of code.

    from amp.descriptor.gaussian import Gaussian
    
    g = Gaussian(cutoff=6.5, Gs={"O": [{"type":"G2", "element":"O", "eta":10.}]})
    
    
    from ase import Atoms
    
    atoms = Atoms('O2', [[0, 0, 0], [1.2, 0, 0]])
    
    # 1st approach
    from amp.utilities import hash_images
    images = hash_images([atoms], ordered=True)
    g.calculate_fingerprints(images)
    
    # 2nd approach
    from amp.utilities import hash_image
    hash = hash_image(atoms)
    g.calculate_fingerprints({hash: atoms})
    

    We may want to update the documentation to be a bit more clear on that.

  3. andrew_peterson repo owner

    Thanks for the clarifying example. To me, the 1st approach is the preferred.

    This also points out that it is a bit confusing that hash_image and hash_images have such similar names but different behavior. That is, hash_image returns just the hash (key) of the image, while hash_images returns a dictionary that has the hashes as keys and the original images as values of the dictionary. hash_image was meant more as an internal function. I suppose we could rename it to get_hash or something to avoid confusion.

    We should probably clarify both these things...

    Btw @jkitchin this example in the documentation likely has the syntax you are looking for.

  4. John Kitchin reporter

    I think it would be better to take an atoms or list of atoms, and internally compute the hashes. Is there some reason not to do that?

  5. Alireza Khorshidi

    I like @jkitchin's idea of embedding hashing stuff within the calculate_fingerprints method, so that the user only feeds a list of ASE Atoms object, and the hashing stuff is done internally (as right now the documentation of calculate_fingerprints method says here but wrongly!)

  6. andrew_peterson repo owner

    It is calculated internally in the Amp.train (here). The thinking was that if we put it here then each descriptor and model doesn't need to do this---less duplicated code. I suppose we could put it in the parent Model and Descriptor classes to minimize the change. But for now I'd say correcting the documentation to state this correctly at least avoids confusing the user. ;)

    @akhorshi thanks for the commits. I like 0d258c0, but I personally would say let's revert 641a6cd to keep our examples simpler. (In general, there's not a good reason for someone to use the second method, and if they have a good reason they're probably sophisticated enough to figure this out on their own.)

  7. andrew_peterson repo owner

    Let's look into also including this internally; seems like a nice thing to have.

  8. Log in to comment