JACOBI - MPI/CUDA (nvvp)

Issue #34 new
jg piccinali repo owner created an issue

DAINT

Get the src

GNU

Compile

  • module swap PrgEnv-cray PrgEnv-gnu
  • module load craype-accel-nvidia35
Currently Loaded Modulefiles:
  1) modules/3.2.10.3
  2) nodestat/2.2-1.0502.53712.3.109.ari
  3) sdb/1.0-1.0502.55976.5.27.ari
  4) alps/5.2.1-2.0502.9041.11.6.ari
  5) lustre-cray_ari_s/2.5_3.0.101_0.31.1_1.0502.8394.10.1-1.0502.17198.8.51
  6) udreg/2.3.2-1.0502.9275.1.12.ari
  7) ugni/5.0-1.0502.9685.4.24.ari
  8) gni-headers/3.0-1.0502.9684.5.2.ari
  9) dmapp/7.0.1-1.0502.9501.5.219.ari
 10) xpmem/0.1-2.0502.55507.3.2.ari
 11) hss-llm/7.2.0
 12) Base-opts/1.0.2-1.0502.53325.1.2.ari
 13) craype-network-aries
 14) craype-sandybridge
 15) craype/2.4.0
 16) slurm
 17) cray-mpich/7.2.2
 18) ddt/5.0
 19) gcc/4.8.2
 20) totalview-support/1.1.4
 21) totalview/8.11.0
 22) cray-libsci/13.0.4
 23) pmi/5.0.7-1.0000.10678.155.25.ari
 24) atp/1.8.2
 25) PrgEnv-gnu/5.2.40
 26) cray-libsci_acc/3.1.1
 27) cudatoolkit/6.5.14-1.0502.9613.6.1
 28) craype-accel-nvidia35
  • make
nvcc  -arch=sm_35 -O3  -c ../jacobi_cuda_kernel.cu -o jacobi_cuda_kernel.o

cc  -D_CSCS_ITMAX=100 -DOMP_MEMLOCALITY -DUSE_MPI 
 -fopenmp -std=c99 -O3 -c ../jacobi_cuda.c -o jacobi_cuda.o

cc  -D_CSCS_ITMAX=100 -DOMP_MEMLOCALITY -DUSE_MPI 
 -fopenmp -std=c99 -O3 jacobi_cuda_kernel.o jacobi_cuda.o 
 -o GNU.santis

Run

  • sbatch.sh santis 1 GNU.santis 2 1 4 "4096 4096 0.1"
  CUDA Driver Version / Runtime Version     6.5 / 6.5
  CUDA Capability Major/Minor version number:    3.5
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535

Jacobi relaxation Calculation: 4096 x 4096 mesh 
with 2 processes and 4 threads + one Tesla K20X for each process.
    204 of 2049 local rows are calculated on the CPU 
to balance the load between the CPU and the GPU. (100 iterations max)
    0, 0.489197
 total: 0.739662 s

Profile with nvvp: offline

  • export PMI_NO_FORK=1
  • sbatch.sh santis 1 GNU.santis 2 1 4 "4096 4096 0.1" "" "-b nvprof -o nvprof.%h.%p "
==21235== Generated result file: nvprof.nid00012.21235
==28098== Generated result file: nvprof.nid00013.28098
  • nvvp nvprof.nid00012.21235 nvprof.nid00013.28098 eff00.png

Profile with nvvp: online

  • recompile WITHOUT mpi:
nvcc  -arch=sm_35 -O3  -c ../jacobi_cuda_kernel.cu -o jacobi_cuda_kernel.o

cc  -D_CSCS_ITMAX=100 -DOMP_MEMLOCALITY \
 -fopenmp -std=c99 -O3 -c ../jacobi_cuda.c -o jacobi_cuda.o

cc  -D_CSCS_ITMAX=100 -DOMP_MEMLOCALITY \
 -fopenmp -std=c99 -O3 jacobi_cuda_kernel.o jacobi_cuda.o \
 -o GNU.santis.nvvp
  • salloc -p ccm
  • module load ccm
  • export PBS_JOBID=$SLURM_JOBID
  • export PMI_NO_FORK=1
  • ccmlogin -V
    • ./GNU.santis.nvvp 4096 4096 0.1
    • nvvp ./GNU.santis.nvvp

Comments (1)

  1. Log in to comment