Consider adding row-padding allocation to device_allocator<cuda_device>

Copied from slack conversation regarding spec pull request #18:

bonachea [12:13 AM] On a related note, I'm worried there may be some serious performance drawbacks to not providing the equivalents of cudaMallocPitch and cudaMalloc3D for our allocator. Specifically, if our allocator does not offer to perform the row-padding of individual multi-d array objects it creates, then the user may be stuck re-implementing that row padding logic himself on the objects we give him to replicate the performance properties of his computational kernel that is carefully tuned for optimal array alignment.

Max Grossman [7:56 PM] @bonachea I would say that's definitely true, but I'm not sure that's something that needs to be included in a minimal v1 release. Though I'm not sure how you would work that into the existing spec draft, without having to re-implement cudaMallocPitch/cudaMalloc3D-like functionality in UPC++ itself (since we'll just be working with a chunk of cudaMalloc-ed memory).

Comments (4)