Invalid teams created by split() are not handled according to spec

Issue #500 resolved

Dan Bonachea created an issue 2021-09-11

The current spec for team::split() says:

If color == team::color_none, the return value is an invalid team that cannot be used in any UPC++ operation except ~team.

and team::destroy() says:

Precondition: This instance must not [...] be an invalid team that resulted from a call to split()

However the current implementation does not enforce these rules for invalid teams generated by split(), and test/collectives.cpp violates them in several ways:

upcxx::team tm4 = tm3c.split(upcxx::team::color_none, 0);
UPCXX_ASSERT_ALWAYS(tm4.rank_n() == 0); // ERRONEOUS by spec

...

tm4.destroy(); // ERRONEOUS by spec

Worse still, the current implementation in verbose mode will confusingly report a "leaked" team when an invalid team generated by split() is NOT destroy()'ed before destruction (although the destructor does not generate a fatal error as it does for valid teams).

This same semantic issue arises with the new team::create that can also create invalid teams for non-participating processes. IIRC we specified these invalid team semantics because:

It allows non-participating processes to immediately discard their invalid team object after calling split()
team::destroy() is collective over this, which is not meaningful for an invalid team

Assuming we agree these semantics remain correct, then I propose to repair the non-conformance in the test+implementation by doing the following:

Remove the erroneous calls from collectives.cpp
Repair the implementation's accounting to not consider invalid teams as "leaked"
Make it an DEBUG-mode fatal error to invoke some or all team methods on invalid teams, as specified. Current status:
- rank_n(), rank_me(), id() are never checked
- destroy() and split() check and prohibit invalid teams generated by the move constructor, but not invalid teams from from split() (attempts to split() an invalid team assert inside GASNet)
- from_world(), operator[]() have range checks that already effectively prohibit use on invalid teams

Comments (6)

Amir Kamil
I’m ambivalent about whether or not an invalid team should be allowed to be destroyed, but I agree that the implementation should not report it as leaked if it isn’t.

I agree with the action items, except that I’m ambivalent about whether destroy() accepts an invalid team (assuming the spec stays the same).
- 2021-09-12T00:48:49+00:00
Dan Bonachea reporter
I’m ambivalent about whether destroy() accepts an invalid team (assuming the spec stays the same).

I don't like non-compliance. If we decide to allow this, then I think we need to change both the spec and implementation to match.

IIUC you're proposing to specify destroy() semantics to the effect:

If this is an invalid team that resulted from a call to split() or create(), then this->destroy() is a non-collective call with internal progress and no other semantics. Otherwise, this function is collective over this team ... <all the current semantics>

That's a bit inelegant, but I don't see a strong semantic argument against it. I'm slightly worried about supporting different "classes" of invalid teams (those created by split/create vs move constructor source vs post-destroy team), but provided we're comfortable with the implementation lumping those all together wrt behavior/enforcement then it's not a major concern.
- 2021-09-12T02:43:26+00:00
Paul Hargrove
I agree with (what I perceive to be) Dan's viewpoint on compliance.
I would prefer to see our implementation comply as well as we can with our own specification, excepting cases where something is still "deferred implementation" or prevented by limitations elsewhere (such as issues in GASNet-EX, or in compilers).

Where we don't comply, for whatever reason, I'd like us to document the deviation (as this issue tracker does). Otherwise, the end user may be left second-guessing their own code when they run into the same issue.

I don't have strong feelings regarding the "best" solution with respect to how the spec should define the behavior for the "empty" team resulting from some split or create calls, nor on the question of if/how these should be treated differently from the other two cases Dan listed of an invalid team.

If @Amir Kamil, @Colin MacLean and @Dan Bonachea cannot agree on something, then I'd be inclined to defer to "implementer's choice".
- 2021-09-12T04:21:41+00:00
Amir Kamil
Well, I was expressing ambivalence, not proposing a specific change. That said, I do see a semantic motivation for allowing destroy() on invalid teams – it allows a user to invoke it without first checking whether the team is valid. I can see this coming up precisely when color_none is passed to split().

I don’t think we should treat different kinds of invalid teams differently. destroy() should be a noop on an invalid team, regardless of how it was created.

My argument for permitting it in the implementation is that it’s the current behavior; moreover, with the implementation reporting a leaked team when destroy() isn’t called, it’s may be that current users who run into this have put the call in to silence that. I see permitting it as mostly harmless while avoiding breaking existing code.

But I’m still largely ambivalent about this. I think that color_none is a rarely if ever used facility, so most users would not be affected by a semantic or implementation change.
- 2021-09-12T08:30:47+00:00
Dan Bonachea reporter
Proposed resolution now in spec PR 76 and impl PR 374
- 2021-09-13T00:53:40+00:00
Dan Bonachea reporter
- changed status to resolved
issue 500: Invalid teams created by split() are not handled according to spec

Repair implementation to behave according to spec:
- Invalid teams created by team::split() are not valid for anything other than the ~team() destructor.
- Ensure these invalid teams are not added to the runtime's registry of collective objects, which previously led to leak warnings in verbose execution mode, unless they were passed to destroy().
- team::destroy() now treats calls on invalid teams as a silent no-op
Fixes issue ~~#500~~

→ <<cset fa2e34584214>>
- 2021-09-15T17:32:18+00:00
Log in to comment

Assignee: Dan Bonachea

Type: bug

Priority: major

Status: resolved

Component: Teams

Milestone: 2021.9.0 release

Version: Development Branch

Votes: 0

Watchers: 1