Question: How to compute subarray and merge

What I want to do

Let each rank compute some subset of an array A and upon completion merge the subsets so that each rank has access to a local copy of A.

From what I understand, there are a couple ways to do this.

Allocate a local vector of size A.size(). Each rank compute its subset. Then each rank broadcast with broadcast(A.data() + rank_start, rank_end-rank_start), rank_me()).

Setup a distributed object of arrays, with each rank’s global array being its subset. Compute the subset locally. Then for each rank, fetch all other ranks' arrays and manually merge.

None of these seem terribly efficient, and surely there must be a more standardized/elegant way to do this.

Comments (4)

Dan Bonachea

removed milestone
changed component to Collective Operations

I don't think I understand your use case (especially the "merge" aspect) but it sounds like you are essentially asking for a gather-to-all collective communication.

Adding more collective operations to UPC++ is on our roadmap, but they are not there yet. Until then, an overlapped series of broadcasts is probably your best option to simulate a dense gather-to-all.

However if this is a large data structure, gather-to-all inherently has a non-scalable output size. You should probably consider whether all ranks really need a local copy of the entire result, or whether you can use an algorithm with more scalable memory behavior that only sends data to ranks that actually need it.

2019-06-09T23:28:21+00:00

Dan Bonachea

@Alexander Ding : What is the status of this? Did I answer your question? Should this issue be closed?

2019-07-21T22:23:03+00:00

Alexander Ding reporter

@Dan Bonachea Yes, thank you! Close the issue please. Sorry! I forgot to reply to this!

2019-07-21T22:24:06+00:00

Alexander Ding reporter

changed status to resolved

2019-07-21T22:24:21+00:00

Dan Bonachea
- removed milestone
- changed component to Collective Operations
I don't think I understand your use case (especially the "merge" aspect) but it sounds like you are essentially asking for a gather-to-all collective communication.

Adding more collective operations to UPC++ is on our roadmap, but they are not there yet. Until then, an overlapped series of broadcasts is probably your best option to simulate a dense gather-to-all.

However if this is a large data structure, gather-to-all inherently has a non-scalable output size. You should probably consider whether all ranks really need a local copy of the entire result, or whether you can use an algorithm with more scalable memory behavior that only sends data to ranks that actually need it.
- 2019-06-09T23:28:21+00:00
Dan Bonachea
@Alexander Ding : What is the status of this? Did I answer your question? Should this issue be closed?
- 2019-07-21T22:23:03+00:00
Alexander Ding reporter
@Dan Bonachea Yes, thank you! Close the issue please. Sorry! I forgot to reply to this!
- 2019-07-21T22:24:06+00:00
Alexander Ding reporter
- changed status to resolved
- 2019-07-21T22:24:21+00:00
Log in to comment