team_id's are not "universal" as documented

Issue #371 resolved
Dan Bonachea created an issue

The spec defines team_id as "A universal name representing a team" which is EqualityComparable (and TriviallyCopyable). This in turn implies that team_id comparison should always be sufficient to determine whether any two teams anywhere in the job are in fact the same team.

Unfortunately (as discovered by CS267 students @Xinyi Qian and @Raehyun Kim ) the 2020.3.0 implementation does not appear to provide the documented universality.

This attached example demonstrates that team_id is not guaranteed to be a globally unique name for a particular team in the current implementation. Example output for 4 processes with 2 local_team()s: (on dirac/gcc10/udp/debug)

0: id_t1={3541522223009875147,6372108535818588829}      id_t2={426333498038458042,542583166138072248}   id_t3={4985218028745241708,3356046829632903843}    id_t4={16255499421184154785,6415631985977346961}        id_world={1229782938247303441,1229782938247303441}      id_local={2459565876494606882,2459565876494606882}
3: id_t1={3541522223009875147,6372108535818588829}      id_t2={426333498038458042,542583166138072248}   id_t3={4985218028745241708,3356046829632903843}    id_t4={16255499421184154785,6415631985977346961}        id_world={1229782938247303441,1229782938247303441}      id_local={2459565876494606882,2459565876494606882}
2: id_t1={3541522223009875147,6372108535818588829}      id_t2={426333498038458042,542583166138072248}   id_t3={4985218028745241708,3356046829632903843}    id_t4={16255499421184154785,6415631985977346961}        id_world={1229782938247303441,1229782938247303441}      id_local={2459565876494606882,2459565876494606882}
1: id_t1={3541522223009875147,6372108535818588829}      id_t2={426333498038458042,542583166138072248}   id_t3={4985218028745241708,3356046829632903843}    id_t4={16255499421184154785,6415631985977346961}        id_world={1229782938247303441,1229782938247303441}      id_local={2459565876494606882,2459565876494606882}
ERROR: t1.id() values are not unique across ranks!
ERROR: t2.id() values are not unique across ranks!
ERROR: t1.id() values are not unique across ranks!
ERROR: t2.id() values are not unique across ranks!
ERROR: local_team().id() for ranks 0 and 2 should be different!
ERROR: t4.id() for ranks 0 and 2 should be different!
ERROR: t1.id() values are not unique across ranks!
ERROR: t2.id() values are not unique across ranks!
ERROR: local_team().id() for ranks 0 and 3 should be different!
ERROR: t4.id() for ranks 0 and 3 should be different!
done.

In the current implementation, team_id values only uniquely name a team with respect to a given process that is ALSO a member of the team being identified. As one example, local_team().id() is always equal across all processes, even when there are multiple disjoint local_team's. If team_id was in fact a "universal name", one would expect the team_id's of disjoint local_team()s to compare as unequal.

The weaker property currently provided is sufficient to implement team_id::here() and team_id::when_here() (which both have a precondition requiring the calling process to be a member of the referenced team), but is too weak to guarantee correct EqualityComparison via team_id::operator==(team_id) when the calling process is not a member of one of teams being identified. Unfortunately, the latter is a useful idiom when team_id's are stored in the global address space, and one users expect to work as documented (as evidenced by the experience of CS267 students).

We should investigate repairing this defect to provide the documented property.

Comments (5)

  1. Dan Bonachea reporter

    Fix issue 371: team_id's are not "universal" as documented

    team.split() and local_team() construction now both stir the "color" from the split operation into the digest, ensuring that corresponding but disjoint teams are assigned distinct digests.

    → <<cset 9a87112d7857>>

  2. Dan Bonachea reporter

    Merging release upcxx-2020.3.2

    • upcxx-2020.3: ChangeLog: update release date 2020.3.2 package version bumps Make future and serialization tests "full" upc++ ChangeLog cleanups Update ChangeLog
    • Added test/regression/issue380.cpp - Fixed issue 380 where rput with source+operation cx failed to compile. Fix issue 368. Needed to add a default constructor for bcast_payload_header in runtime.hpp since it has a union containing a non-trivially constructible member (std::atomic). Add a Serialization example with an abstract base class ChangeLog.md bugfix entry Bugfix issue 369. <completion>::as_future() was typically leaking a small 56 byte heap struct per completion notification due to a refcount bungling in detail::persona_tls::fulfill_during_user_of_active(). configure: search PATH for python3 and python2 configure: implement check_tool_path() Update ChangeLog Add issue371 test Fix issue 371: team_id's are not "universal" as documented Update ChangeLog Add issue343 test fix issue 343: ensure that default-construced team_id provide a unique invalid team_id Update ChangeLog Improve bad_shared_alloc to output shared heap status Add a test to demonstrate shared heap allocation failures Tweak shared heap allocation exceptions test/memberof.cpp: silence a harmless warning with -std=c++2a docs: revise Intel/libstdc++ recommendation bench/misc_perf: avoid C++20 deprecation warnings INSTALL.md: Clarify recommendations for using parallel make ChangeLog.md: list fix #356 Fully qualified std::foo to ::std::foo in UPCXX_SERIALIZED_FOO macros. For members injected into user classes by UPCXX_SERIALIZED_* macros: This fixes issue 356 where non-public constructors were inaccessible to UCPXX_SERIALIZED_XXX macros. The fix entails smuggling a public static function into the class that then calls the constructor. Add issue356 regression test Update ChangeLog ChangeLog: document CROSS to UPCXX_CROSS rename configure: automatically cross-compile on Cray XC Start a ChangeLog for hotfix release

    → <<cset e35f012d9670>>

  3. Log in to comment