Gromacs (scorep)

Issue #47 new
jg piccinali repo owner created an issue

Scorep/142

Daint

Setup

  • /apps/daint/sandbox/twr/gromacs/gromacs-5.0.6.tar.gz
  • module switch PrgEnv-cray PrgEnv-gnu
  • module load fftw
  • module load cmake
  • module load craype-accel-nvidia35

Compile

ccmake \
-DCMAKE_C_COMPILER=`which cc.scorep` \
-DCMAKE_CXX_COMPILER=`which CC.scorep` \
-DCUDA_NVCC_EXECUTABLE=`which nvcc.scorep` \
-DCUDA_HOST_COMPILER=cc \
-DFFTWF_INCLUDE_DIR=/opt/cray/fftw/default/sandybridge/include \
-DGMX_SIMD=AVX_256 \
-DGMX_MPI=ON \
-DGMX_GPU=ON \
-DCMAKE_PREFIX_PATH=/opt/cray/fftw/3.3.4.3/sandybridge/ \
-DGMX_CYCLE_SUBCOUNTERS=ON \
../gromacs-5.0.6
  • CUDA_NVCC_FLAGS:STRING=-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_35,code=compute_35;-use_fast_math;-Xcompiler;-fPIC
  • export SCOREP=OFF
  • make => bin/gmx_mpi

Run

  • export OMP_NUM_THREADS=1
  • export CRAY_CUDA_MPS=1 !!!!!
  • export I=/apps/common/regression/latest/tests
  • export II=9000-scientific_applications_launcher/9022-gromacs_gpu/herflat.tpr
  • aprun -n80 -N8 -d$OMP_NUM_THREADS gmx_mpi mdrun -gpu_id 00000000 -npme 0 -s $I/$II -nsteps 1500
There are: 465399 Atoms
Part of the total run time spent waiting due to load imbalance: 2.3 %
               Core t (s)   Wall t (s)        (%)
       Time:     1440.207       18.071     7969.6
                 (ns/day)    (hour/ns)
Performance:       14.353        1.672
real 26.79

Comments (7)

  1. jg piccinali reporter

    Compile

    Profile

    • export OMP_NUM_THREADS=1
    • export CRAY_CUDA_MPS=1 # !!!!!
    • export I=/apps/common/regression/latest/tests
    • export II=9000-scientific_applications_launcher/9022-gromacs_gpu/herflat.tpr
      • export SCOREP_ENABLE_PROFILING=true
      • export SCOREP_ENABLE_TRACING=false
      • export SCOREP_CUDA_ENABLE=yes
      • aprun -n80 -N8 -d$OMP_NUM_THREADS gmx_mpi mdrun -gpu_id 00000000 -npme 0 -s $I/$II -nsteps 1500
     Part of the total run time spent waiting due to load imbalance: 3.3 %
                   Core t (s)   Wall t (s)        (%)
           Time:     3942.355       49.476     7968.2
                     (ns/day)    (hour/ns)
    Performance:        5.242        4.578
    [NID 00012] 2015-08-20 17:17:11 Apid 168344: initiated application termination
    Application 168344 exit signals: Killed
    real 66.29
    
    • square scorep-20150820_1745_640896956307589 gnu-scalascaP-08202015.png

    Filtering

    • scorep-score */profile.cubex
    Estimated aggregate size of event trace:                   203GB
    Estimated requirements for largest trace buffer (max_buf): 3065MB
    Estimated memory requirements (SCOREP_TOTAL_MEMORY):       3075MB
    
    • scorep-score -f filter.jg */profile.cubex
    Estimated aggregate size of event trace:                   1543MB
    Estimated requirements for largest trace buffer (max_buf): 21MB
    Estimated memory requirements (SCOREP_TOTAL_MEMORY):       31MB
    
    • Is filter.jg ok ?

    Tracing

    • export SCOREP_FILTERING_FILE=filter.jg
    • export SCOREP_TOTAL_MEMORY=40MB
      • export SCOREP_ENABLE_PROFILING=false
      • export SCOREP_ENABLE_TRACING=true
      • export SCOREP_CUDA_ENABLE=yes
      • aprun -n80 -N8 -d$OMP_NUM_THREADS gmx_mpi mdrun -gpu_id 00000000 -npme 0 -s $I/$II -nsteps 1500
     Part of the total run time spent waiting due to load imbalance: 3.4 %
                   Core t (s)   Wall t (s)        (%)
           Time:     1569.636       19.767     7940.6
                     (ns/day)    (hour/ns)
    Performance:       13.121        1.829
    real 30.75
    
    "These enhancements .... will be part of the next Vampir release in June 2013:
    The latest developer version of Vampir introduces partial loading of large
    trace files ...  and hence allows to visualize a specific segment of a trace
    without loading the complete trace. ... While the partial loading with Vampir
    works well for OTF traces, it is not completely enabled for OTF2 traces"
    
  2. jg piccinali reporter
    • aprun -n16 -N8 -d1 gmx_mpi mdrun -gpu_id 00000000 -npme 0 -s $I/$II -nsteps 1500 gnuvampir08202015.png
      • remark for later: export SCOREP_PTHREAD_EXPERIMENTAL_REUSE=true
      • streams ? => /scratch/santis/piccinal/gromacs/GNU482/bin.sc142_B
  3. jg piccinali reporter

    Tracing (Scalasca)

    • scan -f ./filter.jg -t aprun -n 16 -N 8 -d 1 -j 1 ./gmx_mpi mdrun -gpu_id 00000000 -npme 0 -s herflat.tpr -nsteps 1500
    S=C=A=N: Abort: Target executable 1: No such file or directory
    (Hint: if `1' is a parameter of an (ignored) aprun launch argument, rather than the intended target executable, then try quoting "-j 1".)
    
    • scan -f ./filter.jg -t aprun -n 16 -N 8 -d 1 ./gmx_mpi mdrun -gpu_id 00000000 -npme 0 -s herflat.tpr -nsteps 1500
    [00000.0]: SCOUT: PEARL: PEARL: Locations of non-CPU type not yet supported!
    Command exited with non-zero status 4
    real 63.39
    
    • square scorep_gmx_mpi_8p16x1_trace/traces.otf2
  4. Log in to comment