-
assigned issue to
Consider adding row-padding allocation to device_allocator<cuda_device>
Copied from slack conversation regarding spec pull request #18:
bonachea [12:13 AM]
On a related note, I'm worried there may be some serious performance drawbacks to not providing the equivalents of cudaMallocPitch
and cudaMalloc3D
for our allocator.
Specifically, if our allocator does not offer to perform the row-padding of individual multi-d array objects it creates, then the user may be stuck re-implementing that row padding logic himself on the objects we give him to replicate the performance properties of his computational kernel that is carefully tuned for optimal array alignment.
Max Grossman [7:56 PM] @bonachea I would say that's definitely true, but I'm not sure that's something that needs to be included in a minimal v1 release. Though I'm not sure how you would work that into the existing spec draft, without having to re-implement cudaMallocPitch/cudaMalloc3D-like functionality in UPC++ itself (since we'll just be working with a chunk of cudaMalloc-ed memory).
Comments (4)
-
reporter -
reporter There is now a prototype in upcxx-extras PR #11
-
reporter The prototype has been merged to upcxx-extras at 1122022, and a use case is presented in extras PR #13
-
reporter - changed status to resolved
This was discussed in the 2021-04-17 Pagoda meeting.
We resolved that the current extension support in upcxx-extras for
extras::padded_cuda_allocator
is sufficient to meet all current use cases. If this extension gains popularity in the future, we might consider incorporating it into the main specification, but for now its home in extras seems sufficient. - Log in to comment
This issue was triaged at the 2019-07-24 Pagoda issue meeting and assigned a new owner.
We'd like @Max Grossman to investigate providing this functionality.