add an NVSHMEM adapter

Issue #433 new
Max Grossman created an issue

nvshmem and UPC++ each use custom allocation methods to allocate CUDA GPU memory. This makes composing the two (i.e. being able to nvshmem_put and upcxx::copy on the same object in GPU memory) challenging.

UPC++ uses custom allocators created off of CUDA device objects. These custom allocators represent a large pool of pre-allocated memory which users can allocate chunks out of and get CUDA global pointers as a result.

Similarly, nvshmem pre-allocates a large pool of memory (i.e. the symmetric heap) at initialization and offers nvshmem_malloc as a routine for allocating objects out of that pool.

Because nvshmem allocations are symmetric, it has to enforce special rules internally when nvshmem_mallocs are performed – likely it ensures that a single symmetric allocation is done at the same offset in the symmetric heap across all GPUs. UPC++ does not enforce any of these rules (i.e. you can dynamically allocate varying types of memory from custom allocators on different ranks however you like).

Therefore, it is much simpler to take nvshmem’s symmetric heap and wrap it in a UPC++ allocator such that nvshmem_malloc still works as usual, rather than the inverse (i.e. trying to use a regular UPC++ allocator but then add logic on top of it to do symmetric allocations that adhere to whatever internal rules nvshmem requires for symmetric allocations).

There appears to be a global symbol in nvshmem that should be externally visible called nvshmemi_state that stores runtime data structures. The fields in there that mark the starting address and size of the symmetric heap are heap_base and heap_size. This nvshmem to UPC++ adapter would want to extract that information and wrap it in a UPC++ custom allocator. From @Dan Bonachea : “we could potentially write a short upcxx::extra extension that pulled those magic symbols out of NVSHMEM and constructed a private upcxx::device_allocator<cuda_device> and exposed the to_global_ptr method (only).” This would allow a programmer to pass in an address in the nvshmem symmetric heap and get back a UPC++ global_ptr for that same memory location.

Comments (3)

  1. Dan Bonachea
    • removed milestone

    The proposed work lives in upcxx-extras and is not coupled to UPC++ release milestones

  2. Log in to comment