team_id's are not "universal" as documented
The spec defines team_id
as "A universal name representing a team" which is EqualityComparable (and TriviallyCopyable). This in turn implies that team_id
comparison should always be sufficient to determine whether any two teams anywhere in the job are in fact the same team.
Unfortunately (as discovered by CS267 students @Xinyi Qian and @Raehyun Kim ) the 2020.3.0 implementation does not appear to provide the documented universality.
This attached example demonstrates that team_id
is not guaranteed to be a globally unique name for a particular team in the current implementation.
Example output for 4 processes with 2 local_team()
s: (on dirac/gcc10/udp/debug)
0: id_t1={3541522223009875147,6372108535818588829} id_t2={426333498038458042,542583166138072248} id_t3={4985218028745241708,3356046829632903843} id_t4={16255499421184154785,6415631985977346961} id_world={1229782938247303441,1229782938247303441} id_local={2459565876494606882,2459565876494606882}
3: id_t1={3541522223009875147,6372108535818588829} id_t2={426333498038458042,542583166138072248} id_t3={4985218028745241708,3356046829632903843} id_t4={16255499421184154785,6415631985977346961} id_world={1229782938247303441,1229782938247303441} id_local={2459565876494606882,2459565876494606882}
2: id_t1={3541522223009875147,6372108535818588829} id_t2={426333498038458042,542583166138072248} id_t3={4985218028745241708,3356046829632903843} id_t4={16255499421184154785,6415631985977346961} id_world={1229782938247303441,1229782938247303441} id_local={2459565876494606882,2459565876494606882}
1: id_t1={3541522223009875147,6372108535818588829} id_t2={426333498038458042,542583166138072248} id_t3={4985218028745241708,3356046829632903843} id_t4={16255499421184154785,6415631985977346961} id_world={1229782938247303441,1229782938247303441} id_local={2459565876494606882,2459565876494606882}
ERROR: t1.id() values are not unique across ranks!
ERROR: t2.id() values are not unique across ranks!
ERROR: t1.id() values are not unique across ranks!
ERROR: t2.id() values are not unique across ranks!
ERROR: local_team().id() for ranks 0 and 2 should be different!
ERROR: t4.id() for ranks 0 and 2 should be different!
ERROR: t1.id() values are not unique across ranks!
ERROR: t2.id() values are not unique across ranks!
ERROR: local_team().id() for ranks 0 and 3 should be different!
ERROR: t4.id() for ranks 0 and 3 should be different!
done.
In the current implementation, team_id
values only uniquely name a team with respect to a given process that is ALSO a member of the team being identified. As one example, local_team().id()
is always equal across all processes, even when there are multiple disjoint local_team
's. If team_id
was in fact a "universal name", one would expect the team_id
's of disjoint local_team()
s to compare as unequal.
The weaker property currently provided is sufficient to implement team_id::here()
and team_id::when_here()
(which both have a precondition requiring the calling process to be a member of the referenced team), but is too weak to guarantee correct EqualityComparison via team_id::operator==(team_id)
when the calling process is not a member of one of teams being identified. Unfortunately, the latter is a useful idiom when team_id
's are stored in the global address space, and one users expect to work as documented (as evidenced by the experience of CS267 students).
We should investigate repairing this defect to provide the documented property.
Comments (5)
-
reporter -
reporter - changed status to resolved
Fix issue 371: team_id's are not "universal" as documented
team.split() and local_team() construction now both stir the "color" from the split operation into the digest, ensuring that corresponding but disjoint teams are assigned distinct digests.
→ <<cset 9a87112d7857>>
-
reporter Merge pull request #216 into develop
- team_id: Update ChangeLog Add issue371 test Fix issue 371: team_id's are not "universal" as documented Update ChangeLog Add issue343 test fix issue 343: ensure that default-construced team_id provide a unique invalid team_id
→ <<cset a5caf15abce5>>
-
reporter Fix issue 371: team_id's are not "universal" as documented
team.split() and local_team() construction now both stir the "color" from the split operation into the digest, ensuring that corresponding but disjoint teams are assigned distinct digests.
(cherry picked from commit 9a87112d78575fb610b00ddc22db9d86ddb50d07)
→ <<cset aa08bb26cd14>>
-
reporter Merging release upcxx-2020.3.2
- upcxx-2020.3: ChangeLog: update release date 2020.3.2 package version bumps Make future and serialization tests "full" upc++ ChangeLog cleanups Update ChangeLog
- Added test/regression/issue380.cpp - Fixed issue 380 where rput with source+operation cx failed to compile.
Fix issue 368. Needed to add a default constructor for bcast_payload_header in runtime.hpp since it has a union containing a non-trivially constructible member (std::atomic).
Add a Serialization example with an abstract base class
ChangeLog.md bugfix entry
Bugfix issue 369.
<completion>::as_future()
was typically leaking a small 56 byte heap struct per completion notification due to a refcount bungling indetail::persona_tls::fulfill_during_user_of_active()
. configure: search PATH for python3 and python2 configure: implement check_tool_path() Update ChangeLog Add issue371 test Fix issue 371: team_id's are not "universal" as documented Update ChangeLog Add issue343 test fix issue 343: ensure that default-construced team_id provide a unique invalid team_id Update ChangeLog Improve bad_shared_alloc to output shared heap status Add a test to demonstrate shared heap allocation failures Tweak shared heap allocation exceptions test/memberof.cpp: silence a harmless warning with -std=c++2a docs: revise Intel/libstdc++ recommendation bench/misc_perf: avoid C++20 deprecation warnings INSTALL.md: Clarify recommendations for using parallel make ChangeLog.md: list fix#356Fully qualifiedstd::foo
to::std::foo
in UPCXX_SERIALIZED_FOO macros. For members injected into user classes by UPCXX_SERIALIZED_* macros: This fixes issue 356 where non-public constructors were inaccessible to UCPXX_SERIALIZED_XXX macros. The fix entails smuggling a public static function into the class that then calls the constructor. Add issue356 regression test Update ChangeLog ChangeLog: document CROSS to UPCXX_CROSS rename configure: automatically cross-compile on Cray XC Start a ChangeLog for hotfix release
→ <<cset e35f012d9670>>
- Log in to comment
Proposed resolution in pull request #216