Add `upcxx::local_team_position()`

Issue #173 resolved
Max Grossman created an issue

Global Arrays offers the GA_Cluster_nnodes API for getting the # of shared memory nodes in a job:

https://hpc.pnl.gov/globalarrays/api/c_op_api.html#CLUSTER_NNODES

Having a similar capability in UPC++ would be helpful in general and for porting GA programs.

Official response

  • Dan Bonachea

    This simple enhancement request has been sitting around for over a year, and I'd really like to see it addressed soon. The proposed query provides useful information the runtime has readily available and that is difficult for applications to efficiently construct given current queries.

    The only "hard part" is deciding upon an interface.

    Strawman Proposal

    std::pair<intrank_t, intrank_t> upcxx::local_team_position()
    

    Semantics

    Queries information about the disjoint local teams comprising world().

    Returns a std::pair value, such that given returned pair info:

    • info.second provides the number of disjoint local teams in the set comprising world(). During a given execution, this value is equal for all callers.
    • info.first provides an integral index in [0,info.second) that identifies the local team of the calling process within that set. During a given execution, the value returned to two processes is equal if and only if they share a local team.

    The values returned to any given calling process remain stable across subsequent calls.

    Advice to Users: This function returns information about the number and identity of local teams, which delineate the boundaries of shared heap locality within the job (and may correspond to physical node boundaries). Information about a caller's position within its local team is available via local_team().rank_me() and local_team().rank_n().

    Progress Level: none

    Discussion

    The proposed name is based on existing utility function upcxx::local_team_contains(). The proposed description is written in terms of the existing local team semantics, and as with existing sections deliberately avoids guaranteeing local team equivalence to physical node boundaries (although that is the common/default case).

    Implementation is a trivial wrapper around a subset of the information supplied by gex_System_QueryMyPosition()

    Please provide feedback.

Comments (11)

  1. Paul Hargrove

    On the 2021.09.13 call we identified "how many distinct local_team's in the current job" as the desired semantic for this call. In general this could be >= the number of "shared memory nodes", but is a property that the UPC++ runtime maintains already and is not easily inferred from other sources such as the hostname.

  2. Dan Bonachea

    This simple enhancement request has been sitting around for over a year, and I'd really like to see it addressed soon. The proposed query provides useful information the runtime has readily available and that is difficult for applications to efficiently construct given current queries.

    The only "hard part" is deciding upon an interface.

    Strawman Proposal

    std::pair<intrank_t, intrank_t> upcxx::local_team_position()
    

    Semantics

    Queries information about the disjoint local teams comprising world().

    Returns a std::pair value, such that given returned pair info:

    • info.second provides the number of disjoint local teams in the set comprising world(). During a given execution, this value is equal for all callers.
    • info.first provides an integral index in [0,info.second) that identifies the local team of the calling process within that set. During a given execution, the value returned to two processes is equal if and only if they share a local team.

    The values returned to any given calling process remain stable across subsequent calls.

    Advice to Users: This function returns information about the number and identity of local teams, which delineate the boundaries of shared heap locality within the job (and may correspond to physical node boundaries). Information about a caller's position within its local team is available via local_team().rank_me() and local_team().rank_n().

    Progress Level: none

    Discussion

    The proposed name is based on existing utility function upcxx::local_team_contains(). The proposed description is written in terms of the existing local team semantics, and as with existing sections deliberately avoids guaranteeing local team equivalence to physical node boundaries (although that is the common/default case).

    Implementation is a trivial wrapper around a subset of the information supplied by gex_System_QueryMyPosition()

    Please provide feedback.

  3. Amir Kamil

    The proposed interface sounds fine, but I wonder if this is something we can/should provide as a general query on all teams – what is the team’s position with respect to the other teams created by the associated split() or create() call. It’s true that the creator of the team can compute this information (though potentially requiring another collective to do so), but I can imagine a scenario where a team is passed to a library, which wouldn’t necessarily be able to compute that information.

  4. Dan Bonachea

    something we can/should provide as a general query on all teams – what is the team’s position with respect to the other teams created by the associated split() or create() call.

    I also initially wondered whether this was worth generalizing somehow. However the information you suggest is NOT something that we currently compute or track (for split() or create()). Computing such information after a team::create() in particular would require entirely new collective communication across the parent team, where the primary motivation for using team::create() is to exactly to avoid the overhead of such communication. For this reason I'm opposed to this idea.

    In contrast, the information exposed by my proposal is readily and efficiently available from GASNet's node-mapping metadata. The name and description here are slightly misleading: my proposal is named/described in terms of the "local team", because that happens to be the UPC++-level container whose boundaries correspond to the GASNet "neighborhood" abstraction. However, the similarity to upcxx::team ends there. This is really a node topology query and has nothing to do with the local_team() object or any dynamic team.

  5. Log in to comment