RDF accumulate leaks memory and crashes
I am trying to make a very accurate computation of the RDF from a big-ish system of hard spheres (3D). I am using HOOMD-blue for the HPMC, and it works well, with ~22k spheres and 10e6 sweeps. I'm trying to compute RDF using data every 1000 steps, with a cutoff of 8 and a step of 0.001.
But when I try to compute the RDF using Freud following the tutorials, either online in the simulation through a callback or on a saved trajectory with many steps, Freud uses up all available memory as it is iterating through the snapshots, and then crashes with "MemoryError: std::bad_alloc" after 74 of the 1e4 snapshots in my trajectory. To my understanding, once it's finished with one snapshot, that memory should be freed.
Currently I'm working around this by running one script with an outer loop that uses subprocess.Popen to spawn another Python script that runs the RDF computation on 10 snapshots at a time, then saves the result to a .npy file and exits. In the outer loop, I then add up the RDFs and average at the end. This works fine, but it's of course an ugly hack.
I'm not doing anything special, so you should be able to replicate just by taking the example linked below, switching to 3D, increasing the number of spheres, setting rmax=8.0, dr=0.001, and doing enough callbacks (around 74 on my 64GB machine).
Comments (8)
-
-
Thanks for reporting!
Thanks for the info Matt. I'll try and reproduce the problem and hopefully this fixes it.
-
- changed status to resolved
Fix nlist memory issues; fixes issue
#169→ <<cset 58479bea867b>>
-
Fix nlist memory issues; fixes issue
#169→ <<cset 58479bea867b>>
-
Merged in issue169 (pull request #153)
Fix nlist memory issues; fixes issue
#169Approved-by: Bradley Dice bdice@bradleydice.com Approved-by: Vyas Ramasubramani vramasub@umich.edu
→ <<cset 95ec72a368bd>>
-
Merged in issue169 (pull request #153)
Fix nlist memory issues; fixes issue
#169Approved-by: Bradley Dice bdice@bradleydice.com Approved-by: Vyas Ramasubramani vramasub@umich.edu
→ <<cset 95ec72a368bd>>
-
@asmunder thanks again for finding this. We now have a fix on the master branch. We'll aim to make a bugfix release soon, but if you would like something immediately then feel free to clone the repo and confirm that this fix works.
-
To clarify what we learned, the problem wasn't actually the circular references, it was that not enough things were happening at the python level to trigger garbage collections, which are required to clean up objects with circular references (adding periodic calls to
gc.collect()
should fix the observed behavior without updating freud in this case). The solution we opted for is to explicitly break the circular reference for automatically-generated neighbor lists so they get cleaned up immediately by reference counting. - Log in to comment
Some notes for @vramasub and @bdice :
Extra memory usage is due to creating the default neighbor list. In particular the
NeighborList
object doesn't seem to be garbage collected because it points to theCellList
that created it through itsbase
parameter. This was intended to prevent garbage collection problems, but seems to have brought its own (possibly due to some unintuitive behavior of how refcounting works inside cython?). Adding something like:to the end of RDF.accumulate seems to fix the problem.