GlobalArray to add support for multi-GPU runs on dense nodes

Merged
#453 · Created  · Last updated

Merged pull request

Merged in global_arrays (pull request #453)

10ca9ca·Author: ·Closed by: ·2018-11-06

Description

This PR adds GlobalArray, which uses managed memory (cudaMallocManaged) and should replace GPUArray.

I just typed in a lot of information on this PR, but accidentally closed the Browser tab. I will add this information back as move forward. Basically this PR serves as a progress tracker.

ChangeLog

  • Support multi-GPU execution on dense nodes using CUDA managed memory. Execute with --gpu=0,1,..,n-1 command line option to run on the first n GPUs (Pascal and above).

  • Node-local acceleration is implemented for a subset of kernels. Performance improvements may vary.

  • Improvements are only expected with NVLINK hardware. Use MPI when NVLINK is not available.

  • Combine the --gpu=.. command line option with mpirun to execute on many dense nodes

0 attachments

0 comments

Loading commits...