Memory leak on development branch

Create issue
Issue #1412 closed
Seth Hopper created an issue

I am seeing what appears to be a memory leak that has showed up since the Gauss release. I have run the same parameter file on Gauss and the development branch, and see linear growth in memory usage in the development version. From standard out it looks to be a bug in Carpet, as the memory usage jumps after re-gridding.

I've attached:

  • The parameter file, bbhCart2-3.rpar. It runs on 12 cores on Datura.
  • Plots showing the memory usage per process in the two cases.
  • Standard out from the two cases.

Keyword:

Comments (19)

  1. Seth Hopper reporter
    • removed comment

    Standard out was too big to upload. Let me know if you want it, and I can send it along.

  2. Seth Hopper reporter
    • removed comment

    Okay, I zipped the standard out files and now I've been able to upload them. -Seth

  3. Erik Schnetter
    • removed comment

    Please attach the actual parameter file, just to be sure.

    Please attach your thorn list and option list as well, as well as any Simfactory options you used while configuring and building, and which may override some settings. Maybe sending the captured OptionList and ThornList would be best.

    Please attach the content of Cactus's config-data/make.config.defn, just to be safe.

  4. Erik Schnetter
    • removed comment

    Can you also attach the output describing how much memory is used? This is probably the source you used for the pdfs.

  5. Seth Hopper reporter
    • removed comment

    Okay, I think I've uploaded everything you asked for. Let me know if you need anything I missed. Thanks for looking into this Erik! - Seth

  6. Erik Schnetter
    • removed comment

    I see that you are not using the new bboxset2 class, so this new algorithm is not responsible for this memory leak.

  7. Erik Schnetter
    • removed comment

    I can reproduce the problem, even on a single MPI process and with a single OpenMP thread. It does not seem to occur with gcc, only with Intel.

    I corrected a Carpet (or Intel?) problem where memory may have become corrupted, but this does not make the problem go away. The amount of memory actually used (and reachable by) Carpet stays approximately constant.

  8. Roland Haas
    • removed comment

    This seems to be caused by either SYNC or calls to boundary. The parfile vacuum5.par shows linear growth in memory usage, the parfile nothing.par does not. The difference is including ADMBase and "static" evolution in the parfile or not.

  9. Erik Schnetter
    • removed comment

    Yay! Thanks for paring this down.

    Which compiler/machine was this? I'm not completely sure yet that this effect is generic.

  10. Roland Haas
    • removed comment

    Zwicky, intel 11 (icpc --version 11.1 20100414), option list from simfactory (run is in /panfs/ds06/sxs/rhaas/cactus/runs/vacuum5). Run was using 2 processes, 6 threads each. There is beautiful linear growth in memory consumption from step 0.

  11. Roland Haas
    • removed comment

    Given the signature of the memory leak it might actually be easier to run something like mtrace (http://en.wikipedia.org/wiki/Mtrace) and then look for places in the code that allocate lots of memory (identifyable in the matrace output) since in fact no memory might be leaked if "all" that happens is the a C++ container grows without bounds since the container would free itself before program end.

  12. Erik Schnetter
    • removed comment

    ... and here we go -- the latest results! Looking at SystemStatistics (thanks, Ian!), in particular at

    int uordblks
    This is the total size of memory occupied by chunks handed out by malloc. 
    int fordblks
    This is the total size of memory occupied by free (not in use) chunks.
    

    uordblks is remaining constant at 800 MB, fordblks is increasing linearly in time at a rate of about 100 MB / 256 iterations.

    This means that malloc is not reusing memory, and thus the overhead of libc increases with time. We encountered this before; the communication pattern of CarpetIOASCII triggered this. We introduced "memory pools" where we allocated large chunks of memory ahead of time, serving this when allocating the communication buffers, and freeing them when CarpetIOASCII is finished. This works fine.

    I assume we need to do the same for some other thorn. Don't know yet where.

  13. Erik Schnetter
    • removed comment

    More notes:

    It is not a memory leak; only libc's free memory increases.

    The memory increases in jumps of a few MByte. These jumps occur near mode switches or near synchronization.

    The problem only occurs with Intel, not with gcc. With Intel, the libc free memory increases linearly, up to 1.4 GByte in one particular test case. With gcc, the libc free memory stays bounded at about 29 MByte.

    It is not due to RotatingSymmetry180. It is not due to the routine SyncGroups. It is not due to Carpet's timers.

  14. Roland Haas
    • removed comment

    In case it helps: I also see a linear increase in the memory consumption (maxrss_mb) when using gcc 4.7 on my workstation for the vacuum file. So far memory consumption has increased from ~450MB to ~7GB, flattening off a little bit at the end but certainly still increasing.

  15. Erik Schnetter
    • removed comment

    The problem was triggered by LoopControl. Allocating new lc_setup_t or lc_params_t caused libc's "free" memory to increase. I am now using a memory pool there, which circumvents the problem.

    This is new corrected in the development version, which also introduces the new Timer thorn.

  16. Log in to comment