Clone wiki

gkw / DebuggingWithTotalview

Summary: Notes on using GKW with the Totalview debugger (e.g. on Hector).

Introduction

These are some quick notes on the steps taken to get GKW to run with the Totalview GUI debugger on HECTOR. This was used to find a memory leak, which turned out to be inside MPIallreduce in xt-mpich2/5.6.0 (switching to xt-mpich2/5.4.2 fixed the problem).

The full documentation for Totalview is at http://www.roguewave.com/support/product-documentation/totalview.aspx

Re-compile GKW

In /config/.../pgi.mk makefile, set FFLAGS_DEBUG=-traceback -gopt

module swap PrgEnv-cray PrgEnv-pgi
module load xt-totalview totalview-support
make clean
make DEBUG=on
# To avoid malloc linking issue only now load mem-debug module, then relink only
module load totalview-mem-debug # previously xt-totalview-mem-debug
make DEBUG=on

Some info on the linking issue

Adjust usual PBS script

# Add this to header
#PBS -v DISPLAY

# The usual "aprun" becomes "totalview aprun -a -b -a xt", e.g
totalview aprun -a -b -a xt -n 64 -N 8 -d 1 -S 8 ./gkw
  • Connect to Hector with ssh -Y
  • Submit the job, wait for it to start leaving terminal open

Once Totalview starts

  • Enable memory checking option on startup screen in Debug menu.
  • Debug menu -> Open MemoryScape -> Memory debugging options -> Extreme (not needed to find the basic leak)
  • Click play to run GKW. Let GKW get through initialization (check the screen output)
  • Back in Totalview window, pause GKW on a breakpoint: Use Action point -> At location to set breakpoint on a routine name (e.g. fluxes).
  • Debug -> Heap Baseline -> Set Heap Baseline
  • Continue GKW run to breakpoint again (or temporarily disable the breakpoint by right clicking on it to get a few iterations)
  • Debug -> Heap Baseline -> Check for leaks / Compare to baseline -> Find the lines of code

Updated