get_hash function gives me identical hash for different atoms when the size is large

Issue #208 resolved
Mingjie Liu created an issue

Hi,

I found that the get_hash function is not working appropriately when the atoms size is large and when the difference in the atoms is small. For example, I have a 10x10x15 slabs containing 1499 Pd and 1 Au. The hash returned from the function is identical when the index of Au is 600,601,700, 701 or some random numbers in between, probably other values too. Basically, if the Au atom is in the bulk, the hash is the same. The hash when the index of Au is 0 or 1 is different.

import numpy as np
import random
import hashlib
from amp import Amp
from ase.lattice.surface import fcc111
from scipy.interpolate import interp1d

# Simulation bulk composition
x = 1/1500

lat = interp1d([0, 1], [3.934, 4.154])

# Define a dummy slab
atoms = fcc111('Pd', size=(10, 10, 15), vacuum=6.0, a=lat(x))
atoms.set_pbc([1, 1, 0])


def get_hash(atoms):

    string = str(atoms.pbc)
    for number in atoms.cell.flatten():
        string += '%.15f' % number
    string += str(atoms.get_atomic_numbers())
    for number in atoms.get_positions().flatten():
        string += '%.15f' % number

    md5 = hashlib.md5(string.encode('utf-8'))
    hash = md5.hexdigest()
    return hash

atoms[650].symbol = 'Au'
print(get_hash(atoms))
atoms[651].symbol = 'Au'
atoms[650].symbol = 'Pd'
print(get_hash(atoms))

Also, this does not happen when the size is smaller. For example, it doesn't happen when the size is (10x10x10)

Thanks!

Comments (5)

  1. Mingjie Liu reporter

    Hi,

    I think this is due to the line:

    string += str(atoms.get_atomic_numbers())
    

    Apparently, when the list is too long, the str() function is not working correctly.

    if we change it to

    for number in atoms.get_atomic_numbers():
        string += '%.15f' % number
    

    It would work.

  2. Log in to comment