PG: Add more OpenMP Interop examples

Issue #193 resolved
Dan Bonachea created an issue

This issue moved from PG tracker (which is now retired) issue/41:

Brian posted some good ideas on how to mix OpenMP with UPC++.

We should think about fleshing out the OpenMP examples/section of the guide to demonstrate the recommended ways to keep the UPC++ master attentive to progress while doing OpenMP fork/joins.

On Tue, Apr 17, 2018 at 1:06 PM, Brian Van Straalen bvstraalen@lbl.gov wrote:

you need a master thread in the parallel region servicing the queue. When I talked with Hal Finkel about openmp and user-driven progress he had suggested ideas like

    int  thread_n  =  omp_num_threads();
    omp_set_nested(1);

    atomic<bool>  spinning{true};

    #pragma omp parallel
    {
      assert(omp_num_threads() > 1);
      #pragma omp master
      #pragma omp task
      while(spinning.load()) {
          upcxx::progress();
          #pragma omp taskyield
      }

      #pragma omp for
      for(...)  {...}

      spinning.store(false);
    }

Tom Scogland, another OpenMP developer, had a variant where the for loop is done as tasks, so the progress thread and the loop threads are all run as tasks

    #pragma omp parallel
    {
      assert(omp_num_threads() > 1);
    #pragma omp master
      {
        #pragma omp task
        while(spinning.load()) {
            upcxx::progress();
            #pragma omp taskyield
      }

      #pragma omp taskloop
      for(...)  {...}

      spinning.store(false);
      }
    }

The for-loop construct has lower fork costs, since the schedule is static. Task loops need some meat in them to amortize the costs, but FFT is a pretty heavy operation. Still, OpenMP fork-join has non-trivial overheads even when you are calling FFT operations (as I have discovered in my thesis work)

Both should work. Hal mentioned that you are not certain to have C++ atomics and OpenMP threading cooperate, but in practice it does.

I do not understand Hal's use of omp_set_nested here though.

Comments (8)

  1. Dan Bonachea reporter

    Writing this here for the record .

    We have some introductory text in this document:

    https://bitbucket.org/berkeleylab/upcxx/src/master/docs/implementation-defined.md#markdown-header-interoperability-and-multi-threading

    That section should probably be moved/promoted to the PG as part of this work, as introductory text that explains the deadlock issue. Then new PG examples can show how to avoid the problem in several ways: one approach being the original post in this issue that singles out personas that need to remain attentive (usually including master), another being an explicit thread barrier before OMP blocking synchronization as in https://bitbucket.org/berkeleylab/upcxx/src/master/test/rput_omp.cpp.

  2. Log in to comment