- attached issue142.cpp
Clarify semantic restrictions on `UPCXX_THREADMODE=seq`
Currently UPCXX_THREADMODE=seq UPCXX_CODEMODE=debug
hits an assertion failure on any communication injection after calling liberate_master_persona()
, even when there are no other threads.
I don't think we've ever formally documented the restrictions on UPCXX_THREADMODE=seq
mode, but we really need to do that - I assumed any app that never used threads was safe to use the entire UPC++ API. Due to this bug seq mode apps cannot use some features of personas (although they are admittedly not strongly motivated).
This needs to either be fixed or documented as a specified restriction for seq mode.
Comments (8)
-
reporter -
reporter - marked as blocker
This issue was triaged at the 2018-06-13 Pagoda meeting and assigned a new milestone/priority.
Fixing the assertion failure is a blocker.
Subsequently discussing what "seq" means and implementing it is major priority.
-
reporter - changed milestone to 2019.03.31 release
Mass roll-over of unresolved issues to the next milestone.
-
reporter - changed milestone to 2020.3.0 release
This issue was triaged at the 2019-07-24 Pagoda issue meeting and assigned a new milestone.
This issue was deferred pending a better specification of SEQ mode
-
In auditing the code, I have discovered/recollected the following about backend=seq:
-
The runtime's AM handlers blindly assume they own the master persona, and push to its queue in the fast thread-unsafe way. Thus, any upcxx call that calls into gasnet requires the master persona. This is why we ASSERT ownership of master in so many places, which now seems like not enough (rput/rget should assert too). I think this is a good thing.
-
The runtime guard calls to
gasnet_AMPoll()
on the condition:if(this_thread == primordial_thread) gasnet_AMPoll()
. This, when combined with the above, does not allow the primordial thread to call progress while another thread has master: silent racy corruption ensues. I am not in love with this, and would like to see the guard enhanced to also check master ownership. But that's beyond this release. -
Shared heap allocation/deallocation ASSERTS ownership of master. Seems fine to me.
So here's what I believe the documentation should say about restrictions present when backend=seq:
- That only the primordial thread may hold the master persona.
- That the following operations require the master persona (thus the primordial thread too):
- "inter-process communicating operations" (air-quotes to compensate for absence of explicitly enumerated list):
- Note:
progress()
is not included here, which is always safe from any thread any persona. Only a thread owning master drains RPCs. All calls toprogress()
drain inbound lpc's.
- Note:
- shared heap allocate/deallocate.
- "inter-process communicating operations" (air-quotes to compensate for absence of explicitly enumerated list):
I do not advocate munging with the ASSERT's before the release.
-
-
reporter @john bachan : Thanks for the analysis.
I agree we definitely should not be changing assertions for this release. Although IIUC we are basically saying that liberate_master_persona() might eventually be a runtime error in SEQ mode, because there is nothing useful that can be done with that persona aside from re-acquiring it with the same thread (so we might as well immediately signal this as a programming error).
If you have time to draft something today to clarify these SEQ-mode restrictions in the documentation, that could be helpful to end users. The spec currently does not mention SEQ mode at all - this is solely a feature of the implementation, and I think that's a Good Thing. I'm thinking documentation for these restrictions could go as a new section in docs/implementation-defined.md?
-
reporter - changed title to Clarify semantic restrictions on `UPCXX_THREADMODE=seq`
Proposed resolution in pull request #190
The new documentation classifies the original example as erroneous in SEQ mode.
-
- changed status to resolved
Resolved via merge of pull request #190
- Log in to comment
Failure on dirac using nightly build: