Gromacs (craypat)

Issue #25 new
jg piccinali repo owner created an issue

Santis + Cuda/6.5

Setup

module  use /project/csstaff/proposals
module load perflite/622cuda

Compile

Tim's gromacs/5.0

Run

export CRAY_CUDA_MPS=1  # without perftool
export CRAY_CUDA_MPS=0  # with perftool
NO_CUDA_PROXY=1 NNODES=1 NPPN=1 DO_ALL_PME=0 ./runnner-ion.sh

cd /scratch/santis/robinson/GROMACS_PROJECT_2015/pall/gromacs-5.0-paper-benchmarks/ion_channel.bench_cuda65

OMP_NUM_THREADS=8 \
aprun  -cc none -n 4 -N 1 -d 8 \
/apps/santis/sandbox/twr/gromacs-5.0.4/build_craypat/bin/mdrun_mpi \
-npme 0 -s topol.tpr -g bench-ion_4N_4x8_1ppn_dlb-no_jID3547 \
-nsteps -1 -ntomp_pme 0 -resethway -quiet -v \
-noconfout -nb gpu -pin on  -maxh 0.025 -dlb no

Report

eff.png

Comments (4)

  1. jg piccinali reporter

    Perftools/6.3.0 (gpu)

    • cd /apps/common/UES/sandbox/jgp/gromacs/GNU482+CUDA65/
    • module swap PrgEnv-cray PrgEnv-gnu
    • module load craype-accel-nvidia35
    • module load fftw
    • module load cmake/3.3.2
    • ONLY after cmake: module load perftools-cscs/630cuda

    compile

    cmake -DCMAKE_C_COMPILER=`which cc` \
    -DCMAKE_CXX_COMPILER=`which CC` \
    -DCUDA_NVCC_EXECUTABLE=`which nvcc` \
    -DCUDA_HOST_COMPILER=`which cc` \
    -DFFTWF_INCLUDE_DIR=/opt/cray/fftw/default/sandybridge/include \
    -DCMAKE_PREFIX_PATH=/opt/cray/fftw/default/sandybridge/ \
    -DGMX_SIMD=AVX_256 -DGMX_MPI=ON -DGMX_GPU=ON -DGMX_CYCLE_SUBCOUNTERS=ON \
    ../gromacs-5.0.6
    
    • module load perftools-cscs/630cuda # will load "perflite-base/630cuda + perftools-lite-gpu"
    • make gmx -j12
    INFO: creating the CrayPat-instrumented executable '../../bin/gmx_mpi' (gpu) ...OK
    

    Run

    • export CRAY_CUDA_MPS=1
    To allow multiple CPU tasks to simultaneously utilize a # single GPU, the CUDA proxy must be enabled => CRAY_CUDA_MPS=1
    
    • export IN=$HOME/regression.git/tests/9000-scientific_applications_launcher/9022-gromacs_gpu/herflat.tpr
    * aprun -n 80 -N 8 -d 1 -j 1  ./gmx_mpi mdrun -gpu_id 00000000 -npme 0 -s $IN -nsteps 500
    

    eff.png

    io

  2. jg piccinali reporter

    nvprof/65

    • export PMI_NO_FORK=1
    • export CRAY_CUDA_MPS=1
    • export IN=$HOME/regression.git/tests/9000-scientific_applications_launcher/9022-gromacs_gpu/herflat.tpr
    • aprun -n 80 -N 8 -d 1 -j 1 -b nvprof -o nvprof.%h.%p gmx_mpi+notool mdrun -gpu_id 00000000 -npme 0 -s $IN -nsteps 500

    report

    • /apps/daint/UES/5.2.UP04/proposals/pug.git/nvprof/gpu-summary.sh
    gputimems: min=172.908 avg=269.698 max=500.337
    HtoD: min=4.13% avg=7% max=10.14%
    DtoH: min=2.82% avg=5% max=6.69%
    memset: min=0.11% avg=0% max=0.27%
    
  3. jg piccinali reporter

    Perftools/6.3.0 (lite/sample_profile)

    • Same results as with gpu profile (MPI~=26%, USER+ETC~=72%) eff.png
  4. Log in to comment