Summary: Notes on using GKW with the Totalview debugger (e.g. on Hector).
These are some quick notes on the steps taken to get GKW to run with the Totalview GUI debugger on HECTOR. This was used to find a memory leak, which turned out to be inside MPIallreduce in xt-mpich2/5.6.0 (switching to xt-mpich2/5.4.2 fixed the problem).
The full documentation for Totalview is at http://www.roguewave.com/support/product-documentation/totalview.aspx
In /config/.../pgi.mk makefile, set FFLAGS_DEBUG=-traceback -gopt
module swap PrgEnv-cray PrgEnv-pgi module load xt-totalview totalview-support make clean make DEBUG=on # To avoid malloc linking issue only now load mem-debug module, then relink only module load totalview-mem-debug # previously xt-totalview-mem-debug make DEBUG=on
Adjust usual PBS script
# Add this to header #PBS -v DISPLAY # The usual "aprun" becomes "totalview aprun -a -b -a xt", e.g totalview aprun -a -b -a xt -n 64 -N 8 -d 1 -S 8 ./gkw
- Connect to Hector with ssh -Y
- Submit the job, wait for it to start leaving terminal open
Once Totalview starts
- Enable memory checking option
on startup screenin Debug menu. Debug menu -> Open MemoryScape -> Memory debugging options -> Extreme(not needed to find the basic leak)
- Click play to run GKW. Let GKW get through initialization (check the screen output)
- Back in Totalview window, pause GKW on a breakpoint: Use Action point -> At location to set breakpoint on a routine name (e.g. fluxes).
- Debug -> Heap Baseline -> Set Heap Baseline
- Continue GKW run to breakpoint again (or temporarily disable the breakpoint by right clicking on it to get a few iterations)
- Debug -> Heap Baseline -> Check for leaks / Compare to baseline -> Find the lines of code