GlobalArray to add support for multi-GPU runs on dense nodes

Branch: global_arrays

Branch: master

Merged

#453 · Created 2018-03-15 · Last updated 2018-11-06

Merged pull request

Merged in global_arrays (pull request #453)

10ca9ca·Author: Jens Glaser·Closed by: Joshua Anderson·2018-11-06

Description

This PR adds GlobalArray, which uses managed memory (cudaMallocManaged) and should replace GPUArray.

I just typed in a lot of information on this PR, but accidentally closed the Browser tab. I will add this information back as move forward. Basically this PR serves as a progress tracker.

ChangeLog

Support multi-GPU execution on dense nodes using CUDA managed memory. Execute with --gpu=0,1,..,n-1 command line option to run on the first n GPUs (Pascal and above).
Node-local acceleration is implemented for a subset of kernels. Performance improvements may vary.
Improvements are only expected with NVLINK hardware. Use MPI when NVLINK is not available.
Combine the --gpu=.. command line option with mpirun to execute on many dense nodes

GlobalArray to add support for multi-GPU runs on dense nodes

Merged pull request

Description

0 attachments

0 comments

Loading commits...