Issue #38 closed

When PETSc runs out of memory

BarryFSmith
created an issue

When PETSc generates an error out of memory (in debug mode I guess) it should be able to present and object dump and or a memory dump indicating what is using all the memory.

In other words we shouldn't need to tell users "run with -malloc_test -objects_dump" with fewer time-steps to debug this" instead they should be able to get useful information without "running with fewer time-steps".

Comments (2)

  1. Jed Brown

    Note that malloc failing is not the normal symptom of out-of-memory. Instead, malloc succeeds and then some process (maybe yours) gets sent SIGKILL (cannot be caught) when some process (maybe different) tries to touch a page that is not available. Being a memory hog is a good way to make it more likely that your process is killed, but not a guarantee. There are OS-specific ways to influence the OOM killer.

    The main reason for over-commit (leading to this whole situation) is that fork/exec is so common. With over-commit, the pages are just marked copy-on-write in the new process after fork, so a process that is using more than 50% of available memory can fork/exec without needing swap (and do it fast).

    What this means to PETSc is that it will not be possible to handle many memory leaks using the system you propose.

    Also note that functions like printf cannot be used from signal handlers.

  2. Log in to comment