JACOBI - MPI/CUDA (nvvp)
Issue #34
new
DAINT
Get the src
- ssh -Y daint01
- git clone --single-branch -b jacobi https://github.com/eth-cscs/parallel-debuggers.git
- cd parallel-debuggers/jacobi.git/src/
GNU
Compile
- module swap PrgEnv-cray PrgEnv-gnu
- module load craype-accel-nvidia35
Currently Loaded Modulefiles:
1) modules/3.2.10.3
2) nodestat/2.2-1.0502.53712.3.109.ari
3) sdb/1.0-1.0502.55976.5.27.ari
4) alps/5.2.1-2.0502.9041.11.6.ari
5) lustre-cray_ari_s/2.5_3.0.101_0.31.1_1.0502.8394.10.1-1.0502.17198.8.51
6) udreg/2.3.2-1.0502.9275.1.12.ari
7) ugni/5.0-1.0502.9685.4.24.ari
8) gni-headers/3.0-1.0502.9684.5.2.ari
9) dmapp/7.0.1-1.0502.9501.5.219.ari
10) xpmem/0.1-2.0502.55507.3.2.ari
11) hss-llm/7.2.0
12) Base-opts/1.0.2-1.0502.53325.1.2.ari
13) craype-network-aries
14) craype-sandybridge
15) craype/2.4.0
16) slurm
17) cray-mpich/7.2.2
18) ddt/5.0
19) gcc/4.8.2
20) totalview-support/1.1.4
21) totalview/8.11.0
22) cray-libsci/13.0.4
23) pmi/5.0.7-1.0000.10678.155.25.ari
24) atp/1.8.2
25) PrgEnv-gnu/5.2.40
26) cray-libsci_acc/3.1.1
27) cudatoolkit/6.5.14-1.0502.9613.6.1
28) craype-accel-nvidia35
- make
nvcc -arch=sm_35 -O3 -c ../jacobi_cuda_kernel.cu -o jacobi_cuda_kernel.o
cc -D_CSCS_ITMAX=100 -DOMP_MEMLOCALITY -DUSE_MPI
-fopenmp -std=c99 -O3 -c ../jacobi_cuda.c -o jacobi_cuda.o
cc -D_CSCS_ITMAX=100 -DOMP_MEMLOCALITY -DUSE_MPI
-fopenmp -std=c99 -O3 jacobi_cuda_kernel.o jacobi_cuda.o
-o GNU.santis
Run
- sbatch.sh santis 1 GNU.santis 2 1 4 "4096 4096 0.1"
CUDA Driver Version / Runtime Version 6.5 / 6.5
CUDA Capability Major/Minor version number: 3.5
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Jacobi relaxation Calculation: 4096 x 4096 mesh
with 2 processes and 4 threads + one Tesla K20X for each process.
204 of 2049 local rows are calculated on the CPU
to balance the load between the CPU and the GPU. (100 iterations max)
0, 0.489197
total: 0.739662 s
Profile with nvvp: offline
- export PMI_NO_FORK=1
- sbatch.sh santis 1 GNU.santis 2 1 4 "4096 4096 0.1" "" "-b nvprof -o nvprof.%h.%p "
==21235== Generated result file: nvprof.nid00012.21235
==28098== Generated result file: nvprof.nid00013.28098
- nvvp nvprof.nid00012.21235 nvprof.nid00013.28098
Profile with nvvp: online
- recompile WITHOUT mpi:
nvcc -arch=sm_35 -O3 -c ../jacobi_cuda_kernel.cu -o jacobi_cuda_kernel.o
cc -D_CSCS_ITMAX=100 -DOMP_MEMLOCALITY \
-fopenmp -std=c99 -O3 -c ../jacobi_cuda.c -o jacobi_cuda.o
cc -D_CSCS_ITMAX=100 -DOMP_MEMLOCALITY \
-fopenmp -std=c99 -O3 jacobi_cuda_kernel.o jacobi_cuda.o \
-o GNU.santis.nvvp
- salloc -p ccm
- module load ccm
- export PBS_JOBID=$SLURM_JOBID
- export PMI_NO_FORK=1
- ccmlogin -V
- ./GNU.santis.nvvp 4096 4096 0.1
- nvvp ./GNU.santis.nvvp
Comments (1)
-
reporter - Log in to comment