Convert posix threads to OpenMP

Issue #41 new
Thomas Glanzman created an issue

phoSim raytrace code currently utilizes the posix thread API. At NERSC, there exist some powerful profiling tools, such as vTune that helps to identify "hot spots" in the code. These tools have greatly enhanced capability if the OpenMP thread API is used. This is a request to consider making this conversion to OpenMP.

Note that Adrian Pope has created a private working version of phoSim with OpenMP and could provide this as an example of how this conversion might be done.

Comments (6)

  1. John Peterson

    need more information about this. please email an example the OpenMP version. things have evolved quite a bit in the multithreading in phosim, so will have to study this.

  2. adrianpope

    I think I attached a tarball (~18MB) for an OpenMP version of v3.7.14, which is the only version that I remember modifying for OpenMP.

    Things I can remember:

    * In order to keep the repo sizes down, my tarballs don't contain the data or validation directories, so those need to be copied from a bitbucket download of v3.7.14 for things to run.

    * I think I could produce some patch files to show what I modified, but that might be non-trivial - this branch has OpenMP modifications on top of other local changes I made to v3.7.14 for my own style of timing/testing. Alternatively I have a python script somewhere that compares directory trees to each other to look for differences in files, and I can share that.

    * The most important changes are in source/raytrace/sourceloop.cpp. In words, the OpenMP version starts with a loop to order the sources in decreasing brightness, then there is a separate loop to process the sources. The processing loop is (in-order) dynamically scheduled with OpenMP, so the first threads will tackle the brightest sources, and then when any thread finishes a source it just grabs the next unassigned source from the sorted list. OpenMP has much less thread launching overhead than pthreads, so I don't think there's a need to group sources. Output looked the same, and I remember run time being very slightly better with OpenMP, without having to worry about optimizing the source grouping.

    * I remember some trickiness with the Lock struct in source/raytrace/locks.h:

    lock1, lock3, lock7, cond2 were already commented out and I deleted the declarations

    lock2, lock8 seemed unused so I deleted the declarations

    lock4, lock5, lock6 were translated directly from pthread_mutex_t to omp_lock_t

    cond is a pthread_cond_t which does not have a direct equivalent in OpenMP, but it didn't seem necessary for any of the features I was using in my testing, so I commented out all instances of it everywhere in the code. If there is a necessary equivalent in current PhoSim code we'd need to find a better solution to this.

    * I think I did all of my testing with Intel compilers and mostly on KNLs, and it looks like configure and source/raytrace/Makefile are still set up for that, but it should be easy to modify those for GNU compilers on some other architecture.

  3. John Peterson

    Adrian-

    Can you tell me what you think the main advantage of openMP over pthreads would be? Tom said something about the tools?

    John

  4. adrianpope

    OpenMP(3) advantages over pthreads:

    • Overhead: Much less overhead for thread startup, encourages threading more operations due to ease of improving performance.
    • Tasking/Scheduling: More options for thread tasking and scheduling, much higher level thread management than with pthreads, encourages threading more operations due to code simplicity.
    • Runtimes: Mature OpenMP runtimes in most compilers (GNU, Clang, Intel, IBM XL) and continued support, meaning OpenMP performance is continuing to improve, whereas pthreads is a library that isn't changing much, ie. community and vendors are putting effort into OpenMP which applications can leverage. OpenMP support has come a long way in the past decade, much easier to recommend over pthreads now than in the past.
    • Affinity: OpenMP has more options to specify the affinity between logical threads and sockets/cores/hardware-threads, which can be important for algorithms where mapping of logical operations to memory/cache hierarchy makes a difference in performance. Affinity with pthreads is less clear, since the OS/kernel might be allowed to arbitrarily migrate logical threads to different cores/hardware-threads.
    • Tools: Some performance tools are more aware of what's happening in the OpenMP runtime, whereas pthreads looks more like library calls, and this can make it easier to diagnose what's happening in the application at runtime and how that affects performance.

  5. Log in to comment