Parallelize fingerprint derivatives?

Issue #79 new
Andrew Peterson repo owner created an issue

In principle, could fingerprint derivatives be parallelized within each image? It seems like since it is looping over atoms this should be embarassingly parallel; each worker would just feed back the components of the fingerprints it has calculated to the master.

Then when we are running on 8 or 16 cores, the force call could be cut by up to a factor of 8 or 16 --- so from 30 seconds to 2 seconds, for example. Does that sound right?

Of course, this is just for force calls --- in training mode it makes sense to do the easier task of parallelizing over images.

Comments (9)

  1. Andrew Peterson reporter

    Yes, in principle. However, right now we have it parallelizing over images, so that loop is inside of another parallel loop. We would need to figure out how re-structure the parallelization; either for all cases or only during force calls, such that it parallelizes over atoms, not images.

  2. Prateek Mehta

    I tried a hack with the joblib library yesterday to see if I could make it work, something like,

    Parallel(n_jobs=24, backend="threading")(delayed(get_atom_fpp)(atom) for atom in image)

    where get_atom_fpp(atom) is a function that I used to replace the loop over the atoms. It didn't do what I would expect though, which i think is related to it being inside of another parallel loop, as you said.

    I don't fully understand how amp parallelizes things yet, but I will try to take that loop outside the loop over images, to see if that works.

  3. Daniel Hedman

    Parallelization of fingerprint derivatives would be a great addition to AMP, since it would allow more ”realistic” simulations to be performed. Maybe it would be possible to use MPI to parallelize the calculations?

    Using something like:

    Then AMP could be used on MPI capable supercomputers in order to performed production calculations.

  4. Efrem Braun

    Just want to give this issue a bump. I've realized that in my simulations of a ~1000 atom system, I can do training easily since it's parallelizable over all my training images, but doing MD is practically as slow in terms of wall-time with Amp as it would be with a (highly-parallelized) ab-initio code, and the bottleneck for me is definitely calculating each image's fingerprint derivatives in serial.

  5. Efrem Braun

    I noticed on the Code Sprint page that a note was made about OpenKIM possibly solving this issue. Mind giving an update on the working status? I'm trying to decide if I should change my simulation to that of a much smaller system or if I should hold out for a solution to this issue. I might do the latter if a solution is in progress. I'd also be happy to help...just want to make sure I'm not duplicating someone else's efforts.

  6. Andrew Peterson reporter

    The OpenKIM implementation is work going on (actually starting this month, I believe) in Franklin Goldsmith's group. So I don't have an estimated timeline of when it would be completed.

    However, if they are successful, then in my opinion it's not worth us changing our code significantly to allow for parallelization over atoms. A better workflow would be:

    • Use Amp for training.

    • Use Amp for force calls for smaller systems, and in integrated algorithms (like ML-NEB) where ML force calls are not the rate-limiting step.

    • For larger systems, Amp would effectively have a "Save as" feature to write out to OpenKIM. This could be immediately converted into an ASAP or LAMPPS object within an ASE script (we might even want to automate that) or used in separate simulations.

  7. Efrem Braun

    Thanks, that's very helpful to know! OpenKIM is great, and I agree that that would work well. OpenKIM is well-integrated into LAMMPS, and it'd be easy to run such a model there (I'm less familiar with ASAP).

    Two thoughts: 1. I've found that native LAMMPS runs much faster than ASE calling LAMMPS using the lammpslib calculator. But doing this would remove some of Amp's other features like the bootstrap ensemble method. Perhaps ASAP is faster though because less communication is needed than with lammpslib. 2. There is an existing open-source neural net potential code that's implemented in LAMMPS as a "pair_style" ( I've played with it, and it is fast and parallelizable. It wouldn't be that hard to take the neural net parameters output by Amp and post-process them into a text file that could be read using the code they've written (or probably better, modify their code to take input parameters in json format). If the OpenKIM implementation doesn't work out or somebody is eager to simulate a large system, this could be an option. I might be working on this a bit myself.

  8. Log in to comment