Discontiguous job layouts now require `configure --enable-discontig-ranks`
Pull request 373 introduced use of GEX_FLAG_PEER_NEVER_NBRHD
to remove a dynamic branch and dynamically dead inlined code from the performance-critical RMA paths. That optimization is based on the observation that the UPC++ runtime performs its own shared-memory RMA without calling GASNet-EX.
Unfortunately, the logic to construct local_team
allows only contiguous ranges of ranks and this behavior is not likely to ever change (see issue 438). In the presence of discontiguous ranks on the same compute node, there is a very real possibility that local_team
is only a subset of the GASNet-EX "nbrhd". In such cases, the use of GEX_FLAG_PEER_NEVER_NBRHD
for RMA within the nbrhd is erroneous and rightly asserts in a debug build of GASNet-EX.
I believe elimination of these assertion failures is a blocker for the upcoming release.
I will propose three distinct options in the comments.
Comments (5)
-
reporter -
-
assigned issue to
I'm pursuing option 3
-
assigned issue to
-
Proposed resolution in PR 378
-
- changed status to resolved
issue
#502: Add configure --enable-discontig-ranksBy default we now prohibit discontiguous rank layouts with a hard error at startup, unless the library was configured with --enable-discontig-ranks.
Fixes issue
#502.→ <<cset 8f56d7537945>>
-
- Log in to comment
Options that I am aware of:
GEX_FLAG_PEER_NEVER_NBRHD
would be disabled statically at library compile time. If such layouts are prohibited, then the flag would be used, and "option 2" behavior would prevail in the presence of prohibited layout: explanatory error at startup.My current preference is for option 3, with discontiguous layout prohibited by default.