Improve OpenMP interoperability RPC example

Issue #535 new
Daniel Waters created an issue

In upcx-prog-guide/code/rpc-omp.cpp there’s an example of how to avoid deadlock when using OpenMP with threaded UPC++. The parallel OpenMP block looks like this:

#pragma omp parallel num_threads(4)
  {
    int threads = omp_get_num_threads();
    assert(threads>1);
    // OpenMP guarantees master thread has rank 0
    if (omp_get_thread_num() == 0) {
      assert(upcxx::master_persona().active_with_caller());
      do {
        upcxx::progress();
      } while(done.load(std::memory_order_relaxed) != threads);
    } else { // worker threads send RPCs
      upcxx::future<> fut_all = upcxx::make_future();
      for (int i=0; i<10; i++) { // RPC with buddy rank
        upcxx::future<> fut = upcxx::rpc(buddy,[](int tid, int rank){
          std::cout << "RPC from thread " << tid << " of rank "
          << rank << std::endl;
        },omp_get_thread_num(),me);
        fut_all = upcxx::when_all(fut_all,fut);
      }
      fut_all.wait(); // wait for thread quiescence
      done++; // worker threads can sleep at OpenMP barrier
    }
  } // <-- this closing brace implies an OpenMP thread barrier

It works, but each worker thread iterates over the entire for loop, whereas it would be more realistic for them to share the iteration space. We’re not sure how to write that such that the master thread is guaranteed to execute the progress loop. The main issue is that all threads in a team must execute an OpenMP parallel for loop. It would be fine if the master thread executed that, but it seems that in that case the progress loop would have to be a task that is periodically yielded, and although we can guarantee that the master thread creates the task, we don’t know that it will actually execute the task. Using teams doesn’t seem to work either since they must be the same size.

Comments (3)

  1. Log in to comment