local_team requires consecutive ranks

Issue #438 wontfix
Paul Hargrove created an issue

It is currently the case that our implementation of upcxx::local_team() only groups together processes with consecutively numbered ranks.

In the extreme case of cyclic (aka round-robin) assignment of ranks to hosts, one may have dozens or hundreds of processes on the same host with each a member of a distinct "singleton" local_team. While GASNet-EX will still use shared-memory paths for communication among processes on the same host, there are optimizations within the UPC++ runtime which will not be applied, and the application is denied use of global_ptr::local() where it would otherwise be available.

The GASNet-level mechanisms for job spawning which live below upcxx-run are designed to request consecutive assignment of ranks on each host by default, thus ensuring one local_team spanning all processes on the same host is the default. So, this implementation property is not a high priority to fix.

We have briefly discussed the addition of a warning at job start if rank assignment prevents creation of a single local_team per host. The outcome was a recognition that doing so "well" will probably require adding (at least) a reduction collective in the startup code (and doing it poorly would involve a non-scalable scan of data linear in the job size).

Comments (4)

  1. Dan Bonachea

    The original description is not quite right.

    First, UPC++ only considers the GASNet neighborhood (the domain for the shared-memory-bypass transport) when deciding on boundaries for local_team, which will never cross a GASNet neighborhood boundary. The GASNet neighborhood defaults to including all processes co-located on a given host (making these two boundaries identical). However GASNet provides non-default configure and envvar knobs that can result in the neighborhood being a subset of host. Such cases always result in at least one local_team per neighborhood on such a host, and there is never shared-memory bypass (at UPC++ or GASNet level) between ranks sharing a host but appearing in different GASNet neighborhoods.

    The other wrinkle involves the details of UPC++'s fallback behavior in the presence of discontiguous rank assignment across nodes. The actual behavior (up to and including version 2020.11.0) is that any GASNet neighborhood containing a discontiguous "run" of GASNet jobranks results in all members of that neighborhood reverting to a degenerate singleton local_team(). This means that even block-cyclic process layouts across nodes can result in this (correct but degenerate) singleton local_team() behavior for all processes landing in such neighborhoods.

  2. Dan Bonachea

    Users who are trying to debug a process placement or local_team layout issue are highly recommended to spawn using upcxx-run -vv (or UPCXX_VERBOSE=1, which activates a relevant subset of this output) to get console output reporting the process layout and local_team boundaries.

    As of de53f0b, upcxx-run -vv will now report when degenerate singleton local_team()'s have been activated due to discontiguous rank ids in the process neighborhood. Sample upcxx-run -vv output for a discontiguous spawn:

    $ upcxx-run -n 8 -vv ./a.out
    ...
    //////////////////////////////////////////////////
    upcxx::init():
    > CPUs Oversubscribed: no "upcxx::progress() never yields to OS"
    > Shared heap statistics:
      max size: 0x8000000 (128 MB)
      min size: 0x8000000 (128 MB)
      P0 base:  0x7ff317b79000
    > Local team statistics:
      local teams = 5
      min rank_n = 1
      max rank_n = 2
      min discontig_rank = 2
    > WARNING: One or more processes (including rank 2) are co-located in a GASNet neighborhood with discontiguous rank IDs. As a result, these ranks will use a singleton local_team().
    This generally arises when the job spawner is directed to assign processes to nodes in a manner other than pure-blocked layout.
    For details, see issue #438
    //////////////////////////////////////////////////
    UPCXX: Process 0/8 (local_team: 0/2) on pcp-d-6 (16 processors)
    UPCXX: Process 1/8 (local_team: 1/2) on pcp-d-6 (16 processors)
    UPCXX: Process 4/8 (local_team: 1/2) on pcp-d-5 (16 processors)
    UPCXX: Process 3/8 (local_team: 0/2) on pcp-d-5 (16 processors)
    UPCXX: Process 2/8 (local_team: 0/1) on pcp-d-16 (16 processors)
    UPCXX: Process 5/8 (local_team: 0/1) on pcp-d-16 (16 processors)
    UPCXX: Process 6/8 (local_team: 0/2) on pcp-d-15 (16 processors)
    UPCXX: Process 7/8 (local_team: 1/2) on pcp-d-15 (16 processors)
    ...
    

    This warning will ONLY print when using upcxx-run -vv, which explicitly requests job spawn information to the console.

  3. Dan Bonachea

    As of GASNet 90817e7 (currently in the stable branch, to appear in the spring 2021 release) udp-conduit defaults to process rank assignment which is sensitive to host, which should remove the last source of "random" rank assignments that could generate discontiguous rank assignments and degenerate singleton local_team() in UPC++.

    System spawners such as SLURM srun, Cray aprun and jsrun can still be used to force discontiguous rank assignments and generate this behavior (eg assigning ranks cyclically across compute nodes), but this should not happen by default (rank assignments usually default to pure-blocked by compute node). IMO a user who explicitly requests such a layout essentially gets what they deserve wrt local_team membership, and we should not add overhead to operation under normal/expected layout in order to diagnose or slightly improve behavior for such corner-case layouts.

  4. Log in to comment