- changed milestone to 2020.3.0 release
PG: Add more OpenMP Interop examples
This issue moved from PG tracker (which is now retired) issue/41:
Brian posted some good ideas on how to mix OpenMP with UPC++.
We should think about fleshing out the OpenMP examples/section of the guide to demonstrate the recommended ways to keep the UPC++ master attentive to progress while doing OpenMP fork/joins.
On Tue, Apr 17, 2018 at 1:06 PM, Brian Van Straalen bvstraalen@lbl.gov wrote:
you need a master thread in the parallel region servicing the queue. When I talked with Hal Finkel about openmp and user-driven progress he had suggested ideas like
int thread_n = omp_num_threads();
omp_set_nested(1);
atomic<bool> spinning{true};
#pragma omp parallel
{
assert(omp_num_threads() > 1);
#pragma omp master
#pragma omp task
while(spinning.load()) {
upcxx::progress();
#pragma omp taskyield
}
#pragma omp for
for(...) {...}
spinning.store(false);
}
Tom Scogland, another OpenMP developer, had a variant where the for loop is done as tasks, so the progress thread and the loop threads are all run as tasks
#pragma omp parallel
{
assert(omp_num_threads() > 1);
#pragma omp master
{
#pragma omp task
while(spinning.load()) {
upcxx::progress();
#pragma omp taskyield
}
#pragma omp taskloop
for(...) {...}
spinning.store(false);
}
}
The for-loop construct has lower fork costs, since the schedule is static. Task loops need some meat in them to amortize the costs, but FFT is a pretty heavy operation. Still, OpenMP fork-join has non-trivial overheads even when you are calling FFT operations (as I have discovered in my thesis work)
Both should work. Hal mentioned that you are not certain to have C++ atomics and OpenMP threading cooperate, but in practice it does.
I do not understand Hal's use of omp_set_nested here though.
Comments (8)
-
reporter -
reporter - changed milestone to 2020.9.0 release
This was discussed in the 2020-02-12 meeting and deferred to next release milestone.
It was also noted the original persona-example code from the old programming guide used OpenMP, and could be adjusted to meet the updated requirements - the current guide shows how to do it with C++11 threads, but algorithmically the idea is the same.
-
reporter - changed milestone to 2021.3.0 release
Mass roll-over of open issues to next release milestone
-
reporter - changed milestone to 2021.9.0 release
Mass roll-over of open issues to next release milestone
-
reporter - changed milestone to 2022.3.0 release
Mass roll-over of open issues to next release milestone
-
-
assigned issue to
-
assigned issue to
-
reporter Writing this here for the record .
We have some introductory text in this document:
That section should probably be moved/promoted to the PG as part of this work, as introductory text that explains the deadlock issue. Then new PG examples can show how to avoid the problem in several ways: one approach being the original post in this issue that singles out personas that need to remain attentive (usually including master), another being an explicit thread barrier before OMP blocking synchronization as in https://bitbucket.org/berkeleylab/upcxx/src/master/test/rput_omp.cpp.
-
reporter - changed status to resolved
This was resolved by merge of @Daniel Waters PR at https://bitbucket.org/danielwaters/upcxx-prog-guide/pull-requests/2 at PG 01a3f61 ,and Impl 8bd36a6 and with a few subsequent fixups the code is now in automated CI testing
- Log in to comment
This issue was triaged at the 2019-07-24 Pagoda issue meeting and assigned a new milestone.
This documentation issue is blocked behind spec issue #101, which will formalize the correct semantic restrictions on OpenMP interop