- edited description
Gromacs (scorep)
Issue #47
new
Scorep/142
Daint
Setup
- /apps/daint/sandbox/twr/gromacs/gromacs-5.0.6.tar.gz
- module switch PrgEnv-cray PrgEnv-gnu
- module load fftw
- module load cmake
- module load craype-accel-nvidia35
Compile
- git clone https://bitbucket.org/jgphpc/pug.git pug.git
- export PATH=pug.git/scorep/cmake:$PATH
ccmake \
-DCMAKE_C_COMPILER=`which cc.scorep` \
-DCMAKE_CXX_COMPILER=`which CC.scorep` \
-DCUDA_NVCC_EXECUTABLE=`which nvcc.scorep` \
-DCUDA_HOST_COMPILER=cc \
-DFFTWF_INCLUDE_DIR=/opt/cray/fftw/default/sandybridge/include \
-DGMX_SIMD=AVX_256 \
-DGMX_MPI=ON \
-DGMX_GPU=ON \
-DCMAKE_PREFIX_PATH=/opt/cray/fftw/3.3.4.3/sandybridge/ \
-DGMX_CYCLE_SUBCOUNTERS=ON \
../gromacs-5.0.6
- CUDA_NVCC_FLAGS:STRING=-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_35,code=compute_35;-use_fast_math;-Xcompiler;-fPIC
- export SCOREP=OFF
- make
=> bin/gmx_mpi
Run
- export OMP_NUM_THREADS=1
- export
CRAY_CUDA_MPS=1
!!!!! - export I=/apps/common/regression/latest/tests
- export II=9000-scientific_applications_launcher/9022-gromacs_gpu/herflat.tpr
- aprun -n80 -N8 -d$OMP_NUM_THREADS gmx_mpi mdrun -gpu_id 00000000 -npme 0 -s $I/$II -nsteps 1500
There are: 465399 Atoms
Part of the total run time spent waiting due to load imbalance: 2.3 %
Core t (s) Wall t (s) (%)
Time: 1440.207 18.071 7969.6
(ns/day) (hour/ns)
Performance: 14.353 1.672
real 26.79
Comments (7)
-
reporter -
reporter - edited description
-
reporter Compile
- Add -DCMAKE_C_FLAGS=-D_GNU_SOURCE
- Add -DGMX_CYCLE_SUBCOUNTER=OFF
- Turn off other timers ?
- export SCOREP=
ON
; make
Profile
- export OMP_NUM_THREADS=1
- export CRAY_CUDA_MPS=1 # !!!!!
- export I=/apps/common/regression/latest/tests
- export II=9000-scientific_applications_launcher/9022-gromacs_gpu/herflat.tpr
- export SCOREP_ENABLE_PROFILING=true
- export SCOREP_ENABLE_TRACING=false
- export SCOREP_CUDA_ENABLE=yes
- aprun -n80 -N8 -d$OMP_NUM_THREADS gmx_mpi mdrun -gpu_id 00000000 -npme 0 -s $I/$II -nsteps 1500
Part of the total run time spent waiting due to load imbalance: 3.3 % Core t (s) Wall t (s) (%) Time: 3942.355 49.476 7968.2 (ns/day) (hour/ns) Performance: 5.242 4.578 [NID 00012] 2015-08-20 17:17:11 Apid 168344: initiated application termination Application 168344 exit signals: Killed real 66.29
- square scorep-20150820_1745_640896956307589
Filtering
- scorep-score */profile.cubex
Estimated aggregate size of event trace: 203GB Estimated requirements for largest trace buffer (max_buf): 3065MB Estimated memory requirements (SCOREP_TOTAL_MEMORY): 3075MB
- scorep-score -f filter.jg */profile.cubex
Estimated aggregate size of event trace: 1543MB Estimated requirements for largest trace buffer (max_buf): 21MB Estimated memory requirements (SCOREP_TOTAL_MEMORY): 31MB
- Is filter.jg ok ?
Tracing
- export SCOREP_FILTERING_FILE=filter.jg
- export SCOREP_TOTAL_MEMORY=40MB
- export SCOREP_ENABLE_PROFILING=false
- export SCOREP_ENABLE_TRACING=true
- export SCOREP_CUDA_ENABLE=yes
- aprun -n80 -N8 -d$OMP_NUM_THREADS gmx_mpi mdrun -gpu_id 00000000 -npme 0 -s $I/$II -nsteps 1500
Part of the total run time spent waiting due to load imbalance: 3.4 % Core t (s) Wall t (s) (%) Time: 1569.636 19.767 7940.6 (ns/day) (hour/ns) Performance: 13.121 1.829 real 30.75
- SHIT: Your current software license permits trace files with up to 256 concurrent threads of execution.
"These enhancements .... will be part of the next Vampir release in June 2013: The latest developer version of Vampir introduces partial loading of large trace files ... and hence allows to visualize a specific segment of a trace without loading the complete trace. ... While the partial loading with Vampir works well for OTF traces, it is not completely enabled for OTF2 traces"
-
reporter - edited description
-
reporter - aprun -n16 -N8 -d1 gmx_mpi mdrun -gpu_id 00000000 -npme 0 -s $I/$II -nsteps 1500
- remark for later: export SCOREP_PTHREAD_EXPERIMENTAL_REUSE=true
- streams ? => /scratch/santis/piccinal/gromacs/GNU482/bin.sc142_B
- aprun -n16 -N8 -d1 gmx_mpi mdrun -gpu_id 00000000 -npme 0 -s $I/$II -nsteps 1500
-
reporter Tracing (Scalasca)
- scan -f ./filter.jg -t aprun -n 16 -N 8 -d 1 -j 1 ./gmx_mpi mdrun -gpu_id 00000000 -npme 0 -s herflat.tpr -nsteps 1500
S=C=A=N: Abort: Target executable 1: No such file or directory (Hint: if `1' is a parameter of an (ignored) aprun launch argument, rather than the intended target executable, then try quoting "-j 1".)
- scan -f ./filter.jg -t aprun -n 16 -N 8 -d 1 ./gmx_mpi mdrun -gpu_id 00000000 -npme 0 -s herflat.tpr -nsteps 1500
[00000.0]: SCOUT: PEARL: PEARL: Locations of non-CPU type not yet supported! Command exited with non-zero status 4 real 63.39
- square scorep_gmx_mpi_8p16x1_trace/traces.otf2
-
reporter - edited description
- Log in to comment