Provide documentation on parallel/distributed debugging

Issue #293 new
Paul Hargrove created an issue

We have noticed a lack of information on parallel/distributed debugging in debugging.md. All we say now is:

Note in particular that runs of multi-rank jobs on many systems include non-trivial spawning activities (e.g., required spawning scripts and/or fork calls) that serial debuggers generally won't correctly follow and handle. Hence the general recommendation to debug multi-rank jobs by attaching your favorite debugger to already-running rank processes.

We should be advising users to try the same tool(s) the apply to distributed MPI applications. However, that requires launch via mpi-spawner (which we need to figure out how to "spell" for the user).

Perhaps most importantly, we should test what we recommend on at least Cori or Summit (hopefully both).

Comments (9)

  1. Log in to comment