- removed comment
Memory leak on development branch
I am seeing what appears to be a memory leak that has showed up since the Gauss release. I have run the same parameter file on Gauss and the development branch, and see linear growth in memory usage in the development version. From standard out it looks to be a bug in Carpet, as the memory usage jumps after re-gridding.
I've attached:
- The parameter file, bbhCart2-3.rpar. It runs on 12 cores on Datura.
- Plots showing the memory usage per process in the two cases.
- Standard out from the two cases.
Keyword:
Comments (19)
-
reporter -
reporter - removed comment
Okay, I zipped the standard out files and now I've been able to upload them. -Seth
-
- removed comment
Please attach the actual parameter file, just to be sure.
Please attach your thorn list and option list as well, as well as any Simfactory options you used while configuring and building, and which may override some settings. Maybe sending the captured OptionList and ThornList would be best.
Please attach the content of Cactus's config-data/make.config.defn, just to be safe.
-
- removed comment
Can you also attach the output describing how much memory is used? This is probably the source you used for the pdfs.
-
reporter - removed comment
Okay, I think I've uploaded everything you asked for. Let me know if you need anything I missed. Thanks for looking into this Erik! - Seth
-
- removed comment
I see that you are not using the new bboxset2 class, so this new algorithm is not responsible for this memory leak.
-
- removed comment
I can reproduce the problem, even on a single MPI process and with a single OpenMP thread. It does not seem to occur with gcc, only with Intel.
I corrected a Carpet (or Intel?) problem where memory may have become corrupted, but this does not make the problem go away. The amount of memory actually used (and reachable by) Carpet stays approximately constant.
-
- removed comment
This seems to be caused by either SYNC or calls to boundary. The parfile vacuum5.par shows linear growth in memory usage, the parfile nothing.par does not. The difference is including ADMBase and "static" evolution in the parfile or not.
-
- removed comment
Yay! Thanks for paring this down.
Which compiler/machine was this? I'm not completely sure yet that this effect is generic.
-
- removed comment
Zwicky, intel 11 (icpc --version 11.1 20100414), option list from simfactory (run is in /panfs/ds06/sxs/rhaas/cactus/runs/vacuum5). Run was using 2 processes, 6 threads each. There is beautiful linear growth in memory consumption from step 0.
-
- removed comment
Seth's runs were all using Intel as well; the current simfactory configurations for datura.cfg and supermuc.cfg. Note that the problem is not present in the Gauss release; if reading the commits since then doesn't throw up something obvious, you could also try bisecting. See http://git.barrywardell.net/simfactory2.git/blob/HEAD:/bin/bisect-test.
-
- removed comment
Given the signature of the memory leak it might actually be easier to run something like mtrace (http://en.wikipedia.org/wiki/Mtrace) and then look for places in the code that allocate lots of memory (identifyable in the matrace output) since in fact no memory might be leaked if "all" that happens is the a C++ container grows without bounds since the container would free itself before program end.
-
- removed comment
... and here we go -- the latest results! Looking at SystemStatistics (thanks, Ian!), in particular at
int uordblks This is the total size of memory occupied by chunks handed out by malloc. int fordblks This is the total size of memory occupied by free (not in use) chunks.
uordblks is remaining constant at 800 MB, fordblks is increasing linearly in time at a rate of about 100 MB / 256 iterations.
This means that malloc is not reusing memory, and thus the overhead of libc increases with time. We encountered this before; the communication pattern of CarpetIOASCII triggered this. We introduced "memory pools" where we allocated large chunks of memory ahead of time, serving this when allocating the communication buffers, and freeing them when CarpetIOASCII is finished. This works fine.
I assume we need to do the same for some other thorn. Don't know yet where.
-
- removed comment
More notes:
It is not a memory leak; only libc's free memory increases.
The memory increases in jumps of a few MByte. These jumps occur near mode switches or near synchronization.
The problem only occurs with Intel, not with gcc. With Intel, the libc free memory increases linearly, up to 1.4 GByte in one particular test case. With gcc, the libc free memory stays bounded at about 29 MByte.
It is not due to RotatingSymmetry180. It is not due to the routine SyncGroups. It is not due to Carpet's timers.
-
- removed comment
In case it helps: I also see a linear increase in the memory consumption (maxrss_mb) when using gcc 4.7 on my workstation for the vacuum file. So far memory consumption has increased from ~450MB to ~7GB, flattening off a little bit at the end but certainly still increasing.
-
- removed comment
The culprit seems to be ggf::ref_bnd_prolongate_all or one of its children.
-
- removed comment
The problem was triggered by LoopControl. Allocating new lc_setup_t or lc_params_t caused libc's "free" memory to increase. I am now using a memory pool there, which circumvents the problem.
This is new corrected in the development version, which also introduces the new Timer thorn.
-
- changed status to resolved
- removed comment
-
- edited description
- changed status to closed
- Log in to comment
Standard out was too big to upload. Let me know if you want it, and I can send it along.