Invalid teams created by split() are not handled according to spec
The current spec for team::split() says:
If color == team::color_none, the return value is an invalid team that cannot be used in any UPC++ operation except ~team.
and team::destroy() says:
Precondition: This instance must not [...] be an invalid team that resulted from a call to split()
However the current implementation does not enforce these rules for invalid teams generated by split()
, and test/collectives.cpp violates them in several ways:
upcxx::team tm4 = tm3c.split(upcxx::team::color_none, 0);
UPCXX_ASSERT_ALWAYS(tm4.rank_n() == 0); // ERRONEOUS by spec
...
tm4.destroy(); // ERRONEOUS by spec
Worse still, the current implementation in verbose mode will confusingly report a "leaked" team when an invalid team generated by split()
is NOT destroy()
'ed before destruction (although the destructor does not generate a fatal error as it does for valid teams).
This same semantic issue arises with the new team::create
that can also create invalid teams for non-participating processes.
IIRC we specified these invalid team semantics because:
- It allows non-participating processes to immediately discard their invalid team object after calling
split()
team::destroy()
is collective overthis
, which is not meaningful for an invalid team
Assuming we agree these semantics remain correct, then I propose to repair the non-conformance in the test+implementation by doing the following:
- Remove the erroneous calls from collectives.cpp
- Repair the implementation's accounting to not consider invalid teams as "leaked"
- Make it an DEBUG-mode fatal error to invoke some or all team methods on invalid teams, as specified. Current status:
rank_n()
,rank_me()
,id()
are never checkeddestroy()
andsplit()
check and prohibit invalid teams generated by the move constructor, but not invalid teams from fromsplit()
(attempts tosplit()
an invalid team assert inside GASNet)from_world()
,operator[]()
have range checks that already effectively prohibit use on invalid teams
Comments (6)
-
-
reporter I’m ambivalent about whether destroy() accepts an invalid team (assuming the spec stays the same).
I don't like non-compliance. If we decide to allow this, then I think we need to change both the spec and implementation to match.
IIUC you're proposing to specify
destroy()
semantics to the effect:If
this
is an invalid team that resulted from a call to split() or create(), thenthis->destroy()
is a non-collective call with internal progress and no other semantics. Otherwise, this function is collective over this team ... <all the current semantics>That's a bit inelegant, but I don't see a strong semantic argument against it. I'm slightly worried about supporting different "classes" of invalid teams (those created by split/create vs move constructor source vs post-destroy team), but provided we're comfortable with the implementation lumping those all together wrt behavior/enforcement then it's not a major concern.
-
I agree with (what I perceive to be) Dan's viewpoint on compliance.
I would prefer to see our implementation comply as well as we can with our own specification, excepting cases where something is still "deferred implementation" or prevented by limitations elsewhere (such as issues in GASNet-EX, or in compilers).Where we don't comply, for whatever reason, I'd like us to document the deviation (as this issue tracker does). Otherwise, the end user may be left second-guessing their own code when they run into the same issue.
I don't have strong feelings regarding the "best" solution with respect to how the spec should define the behavior for the "empty" team resulting from some split or create calls, nor on the question of if/how these should be treated differently from the other two cases Dan listed of an invalid team.
If @Amir Kamil, @Colin MacLean and @Dan Bonachea cannot agree on something, then I'd be inclined to defer to "implementer's choice".
-
Well, I was expressing ambivalence, not proposing a specific change. That said, I do see a semantic motivation for allowing
destroy()
on invalid teams – it allows a user to invoke it without first checking whether the team is valid. I can see this coming up precisely whencolor_none
is passed tosplit()
.I don’t think we should treat different kinds of invalid teams differently.
destroy()
should be a noop on an invalid team, regardless of how it was created.My argument for permitting it in the implementation is that it’s the current behavior; moreover, with the implementation reporting a leaked team when
destroy()
isn’t called, it’s may be that current users who run into this have put the call in to silence that. I see permitting it as mostly harmless while avoiding breaking existing code.But I’m still largely ambivalent about this. I think that
color_none
is a rarely if ever used facility, so most users would not be affected by a semantic or implementation change. -
reporter Proposed resolution now in spec PR 76 and impl PR 374
-
reporter - changed status to resolved
issue 500: Invalid teams created by split() are not handled according to spec
Repair implementation to behave according to spec:
- Invalid teams created by
team::split()
are not valid for anything other than the~team()
destructor. - Ensure these invalid teams are not added to the runtime's registry
of collective objects, which previously led to leak warnings in
verbose execution mode, unless they were passed to
destroy()
. team::destroy()
now treats calls on invalid teams as a silent no-op
Fixes issue
#500→ <<cset fa2e34584214>>
- Log in to comment
I’m ambivalent about whether or not an invalid team should be allowed to be destroyed, but I agree that the implementation should not report it as leaked if it isn’t.
I agree with the action items, except that I’m ambivalent about whether
destroy()
accepts an invalid team (assuming the spec stays the same).