NVIDIA Multi-Process Service MPS
Issue #57
new
PizDaint
CRAY_CUDA_MPS
Overrides the site default for execution in simultaneous
contexts on GPU-equipped nodes (e.g. Hyper Q, CUDA proxy).
Setting to 1 or on will enable the CUDA proxy. To disable
CUDA proxy, set to 0 or off. Debugging and use of
performance tools to collect GPU statistics is only
supported with the CUDA proxy disabled.
Setup
- git clone https://github.com/lichinka/L2.git L2_lichinka.git
- cd L2_lichinka.git/17591/
- module swap PrgEnv-cray PrgEnv-gnu
- module swap gcc gcc/4.8.2
- module load craype-accel-nvidia35
Compile
- cc proxy.c -o $PE_ENV
Run
- export CRAY_CUDA_MPS=1
- sbatch.sh santis 1 ./GNU 4 4 1
Big1: 2000x2000x2000
Running cublas on 2000x2000x2000 with 1 and then with 4 PEs...
2000x2000x2000 DGEMM -- 1 PE, overall Gflops = 1008.579518 0.015864 s.
1- pid 0, my Gflops = 1008.579518 0.015864 s.
2000x2000x2000 DGEMM -- 4 PE, overall Gflops = 76.414291 0.209385 s.
2- pid 3, my Gflops = 83.771418 0.190996 s.
2- pid 1, my Gflops = 90.812583 0.176187 s.
2- pid 2, my Gflops = 76.420469 0.209368 s.
2- pid 0, my Gflops = 1070.231465 0.014950 s.
Small1: 2000x500x2000
2000x500x2000 DGEMM -- 1 PE, overall Gflops = 979.691445 0.004083 s.
3- pid 0, my Gflops = 979.920332 0.004082 s.
2000x500x2000 DGEMM -- 4 PE, overall Gflops = 1053.597048 0.015186 s.
4- pid 0, my Gflops = 263.469581 0.015182 s.
4- pid 1, my Gflops = 331.617963 0.012062 s.
4- pid 3, my Gflops = 310.482197 0.012883 s.
4- pid 2, my Gflops = 291.737080 0.013711 s.
Run (nvprof/nvvp)
- export CRAY_CUDA_MPS=1
- unset COMPUTE_PROFILE
- export PMI_NO_FORK=1
- sbatch.sh santis 1 ./GNU 4 4 1 "" "" "-b nvprof -o nvprof.output.%h.%p"
- nvvp
Big1/1
Big1/4
Small1/1
- x
Small1/4
Comments (7)
-
reporter -
reporter - edited description
-
reporter 5000x5000x5000 / 1mpi
5000x5000x5000 / 4mpi
* if export CRAY_CUDA_MPS=0
-
reporter perftools-lite/6.2.5
- module load perftools-lite
- export CRAYPAT_LITE=gpu
- cc proxy2.c -o GNU.2+ptl625
- export CRAY_CUDA_MPS=1
-
reporter - edited description
-
reporter - edited description
-
reporter - aprun -n1 nvidia-smi -q
- Compute Mode : Exclusive_Process
EXCLUSIVE_PROCESS – the GPU is assigned to only one process at a time, and individual process threads may submit work to the GPU concurrently.
- aprun -n1 nvidia-smi -q
- Log in to comment
scorep/1.4.2
Big1/1
Big1/4
Small1/1
Small1/4