Commits

peller committed 07057de

Final changes related to cleaning up the code and adding more documentation.

  • Participants
  • Parent commits 130c71f
  • Branches peller/threadcomm

Comments (0)

Files changed (14)

File include/petsc-private/threadcommimpl.h

   PetscBool                ismainworker; /* Is the main thread also a work thread? */
 
   // Thread information
-  PetscThreadPool         pool;          /* Threadpool containing threads for this comm */
-  PetscInt                threadblock;   /* Number of threads in pool occupied by this comm (including threads
+  PetscThreadPool          pool;         /* Threadpool containing threads for this comm */
+  PetscInt                 threadblock;  /* Number of threads in pool occupied by this comm (including threads
                                             that are intentionally unused) */
-  PetscInt                shift;         /* Pool rank of smallest thread in threadcomm, used to shift pool ranks
+  PetscInt                 shift;        /* Pool rank of smallest thread in threadcomm, used to shift pool ranks
                                             for threads in threadcomm so that the smallest rank is 0 */
-  PetscInt                ncommthreads;  /* Number of threads in this comm */
-  PetscThread             *commthreads;  /* Threads that this comm can use */
+  PetscInt                 ncommthreads; /* Number of threads in this comm */
+  PetscThread              *commthreads; /* Threads that this comm can use */
 };
 
 #undef __FUNCT__

File src/docs/tex/manual/developers.tex

 \chapter{Threads}
 \label{sec:threads}
 
-PETSc can use a hybrid programming model with a number of different thread types and threading models. PETSc supports PThreads, OpenMP, and Intel Threaded Building Blocks thread types. More notes on all threadcomm routines are found in the function headers in the code directory in src/sys/threadcomm. Examples for each threading model can be found in src/sys/threadcomm/examples. Note that the threading code is a work in progress and parts of the code need more work to be made more reliable and effective.
+PETSc can use a hybrid programming model with a number of different thread types and
+threading models. PETSc supports PThread, OpenMP, and Intel Threaded Building Block
+thread types. More notes on all threadcomm routines are found in the function headers
+in the code directory in src/sys/threadcomm. Examples for each threading model can be
+found in src/sys/threadcomm/examples. Note that the threading code is a work in progress
+and parts of the code need more work to be made more reliable and effective.
 
 \section{Thread Concepts}
 
 \subsection{Thread Communicators}
 
-Threadcomms are used to allow users to interact with threads. Each threadcomm is attached to an MPI Comm, and an MPI Comm can have at most one threadcomm attached to it. A number of routines are provided to allow users to create threadcomms. These include routines to create a single threadcomm, to create threadcomms that share threads with another threadcomm, and to create multiple threadcomms at once.
+Threadcomms are used to allow users to interact with threads. Each threadcomm is
+attached to an MPI Comm, and an MPI Comm can have at most one threadcomm attached to it.
+A number of routines are provided to allow users to create threadcomms. These include
+routines to create a single threadcomm, to create threadcomms that share threads with
+another threadcomm, and to create multiple threadcomms at once.
 
-When a PETSc object is created, it will be associated with a MPI Comm and the threadcomm associated with that MPI Comm. By default after a threaded kernel is executed, PETSc will call a barrier on all threads in the threadpool. The user can pass in a command line option to turn this off, but will then be required to use synchronization as needed to ensure that the code works correctly.
+When a PETSc object is created, it will be associated with a MPI Comm and the threadcomm
+attached to that MPI Comm. By default after a threaded kernel is executed, PETSc will
+call a barrier on all threads in the threadpool. The user can pass in a command line
+option to turn this off, but will then be required to use synchronization as needed to
+ensure that the code works correctly.
 
 \subsection{Thread Pools}
 
-Threadpools are used to hold and manage a group of threads. When a threadcomm is created a threadpool is also created holding all threads in the threadcomm, and only this threadcomm will have access to these threads. If a threadcomm is created from a previously existing threadcomm, then this new threadcomm will share the same threadpool as the previously existing threadcomm.
+Threadpools are used to hold and manage a group of threads. When a threadcomm is
+created a threadpool is also created holding all threads in the threadcomm, and
+only this threadcomm will have access to these threads. If a threadcomm is created
+from a previously existing threadcomm, then this new threadcomm will share the same
+threadpool as the previously existing threadcomm.
 
 \subsection{Threads}
 
-When a threadpool is created, a thread is created for each thread in the threadpool. When using a threading model that allows PETSc to create the threads, the worker threads will enter a spin loop in the threadpool to wait for jobs, while the master threads will execute the users code.
-
-When using a threading model that allows users to create the threads, then PETSc will create the structs and variables for all threadcomms, threadpools, and threads, but there will not be any threads in the threadpool. Instead, the user will create the threads and give them to a specific threadcomm. At that point, the worker threads will enter a spin loop in the threadpool to wait for work, while the master threads will execute the users code. The user can later take back control of these threads, although PETSc will maintain control of the threads until they have completed all of their jobs.
-
-Each thread has its own job queue containing a list of jobs to complete. The max number of jobs in this list depends on the number of kernels setting. If a job is added to a job queue that is full of jobs that have not been completed, then the thread assigning the job will wait until a job has been completed and then add the new job to the job queue.
+When a threadpool is created, a thread is created for each thread in the threadpool.
+When using a threading model that allows PETSc to create the threads, the worker
+threads will enter a spin loop in the threadpool to wait for jobs, while the master
+threads will execute the users code.
+
+When using a threading model that allows users to create the threads, then PETSc
+will create the structs and variables for all threadcomms, threadpools, and threads,
+but there will not be any threads in the threadpool. Instead, the user will create
+the threads and give them to a specific threadcomm. At that point, the worker threads
+will enter a spin loop in the threadpool to wait for work, while the master threads
+will execute the users code and give jobs to the worker threads. The user can later
+take back control of these threads, although PETSc will maintain control of the
+threads until they have completed all of their jobs.
+
+Each thread has its own job queue containing a list of jobs to complete. The max
+number of jobs in this list depends on the number of kernels setting. If a job is
+added to a job queue that is full of jobs that have not been completed, then the
+thread assigning the job will wait until a job has been completed and then add the
+new job to the job queue.
 
 \subsection{Thread Safety}
 
-One of the key requirements for using multiple threads is that the code must have some mechanism to guarantee safe, consistent execution. PETSc contains many global variables and data structures used throughout the code that must be modified to be thread safe. The primary approach to thread safety so far is to make each global variable thread local. This can allow each thread to maintain information specific to that thread. Then when each thread is destroyed, merge the thread specific data with global data as needed using locks. This approach is currently used for the malloc code and the petscstack code.
-
-In some cases it is necessary to use locks to ensure that only a single thread accesses a data structure at a time. In this case a lock will have to be acquired at the beginning of a section of code and then released at the end of a section of code. Locks need to be used in a limited manner to avoid hurting performance. Currently this is used to merge data from thread specific structs for worker threads to global structs as well as to ensure that only a single thread initializes the vector package. However there may be a better approach to making the vector package initialization thread safe since this routine calls logging functions that are currently not thread safe.
+One of the key requirements for using multiple threads is that the code must have
+some mechanism to guarantee safe, consistent execution. PETSc contains many global
+variables and data structures used throughout the code that must be modified to be
+thread safe. The primary approach to thread safety so far is to make many global
+variables thread local. This can allow each thread to maintain information specific
+to that thread. Then when each thread is destroyed, merge the thread specific data
+with global data as needed using locks. This approach is currently used for the
+malloc code and the petscstack code.
+
+In some cases it is necessary to use locks to ensure that only a single thread
+accesses a data structure at a time. In this case a lock will have to be acquired
+at the beginning of a section of code and then released at the end of a section of
+code. Locks need to be used in a limited manner to avoid hurting performance.
+Currently locks are used to merge data from thread specific structs for worker
+threads to global structs as well as to ensure that only a single thread initializes
+the vector package. However there may be a better approach to making the vector
+package initialization thread safe since this routine calls logging functions that
+are currently not thread safe. Similar changes will likely be needed for initializing
+other packages.
 
 \section{Thread Models}
 
-There are currently three threading models implemented in PETSc that give users different levels of control of the threads.
+There are currently three threading models implemented in PETSc that give users
+different levels of control of the threads.
 
 \subsection{Loop}
 
-This is a simple threading model that allows PETSc to use shared memory parallelism within PETSc routines. Users must pass in command line arguments to set the thread type, thread model, and number of threads to use this model. PETSc will automatically create a single threadcomm and attach it to PETSC\_COMM\_WORLD. When a user calls a PETSc function, then if that function has a threaded implementation, then PETSc will call that kernel routine. The user will not have to write any threaded code.
-
-This threading model works with PThreads, OpenMP, and Intel Threaded Building Blocks.
+This is a simple threading model that allows PETSc to use shared memory parallelism
+within PETSc routines. Users must pass in command line arguments to set the thread
+type, thread model, and number of threads to use this model. PETSc will automatically
+create a single threadcomm and attach it to PETSC\_COMM\_WORLD. A single master thread
+will execute the users code while the remaining worker threads will wait in a spin
+loop in the threadpool for jobs to complete. When a user calls a PETSc function, then
+if that function has a threaded implementation, then PETSc will call that kernel
+routine. The user will not have to write any threaded code. This threading model works
+with PThreads, OpenMP, and Intel Threaded Building Blocks.
 
 \subsection{Auto}
 
-This threading model gives more control of the threads to the user. The user must create threadcomms, but when a threadcomm is created, PETSc will create the threads and add those threads to a threadpool. A single master thread will execute the users code and give work to each threadcomm and each thread. The threads will wait in the threadpool for work and complete jobs as the master thread assigns them. To get the best performance when using multiple threadcomms, it is best to turn off explicit synchronization. This will allow the master thread to avoid waiting in barriers and assign more jobs more quickly.
-
-This threading model only works with PThreads.
+This threading model gives more control of the threads to the user. The user must
+create threadcomms, but when a threadcomm is created, PETSc will create the threads
+and add those threads to a threadpool. A single master thread will execute the users
+code and give work to each threadcomm and each thread. The threads will wait in the
+threadpool for work and complete jobs as the master thread assigns them. To get the
+best performance when using multiple threadcomms, it is best to turn off explicit
+synchronization. This will allow the master thread to avoid waiting in barriers and
+assign more jobs more quickly. This threading model only works with PThreads.
 
 \subsection{User}
 
-This threading model allows users to create all threads and explicitly give them to PETSc or take control back from PETSc. A user can create threadcomms prior to creating threads to set up the threadcomm structs and variables, and then later give threads to the threadcomms. A CommJoin routine can be called to give control of threads to PETSc and a CommReturn routine can be called to take back control of the threads. When PETSc is given control of the threads, a single master thread for each threadcomm will return from CommJoin and execute the users code while the worker threads will wait in the threadpool for jobs to execute. When CommReturn is called to return control of the threads to the user, the master threads will set all worker threads in a threadcomm to exit the spin loop once all jobs have been completed and cause those threads to exit the CommJoin routine.
-
-The CommJoin routine will return a comm rank, which will be a nonnegative integer. All worker threads will return with a negative integer. An if statement must be placed immediately after CommJoin to ensure that worker threads do not call the previously executed routines twice. This if statement must end immediately before the CommReturn routine to allow the worker and master threads to reach the same point in the code.
-
-After creating the threads, the user must call PetscThreadInitialize() to initialize thread specific PETSc data structures. Before destroying threads, the user must call PetscThreadFinalize() to merge any thread specific data to a global data structure and then destroy any thread specific data structures.
-
-This threading model works with PThreads and OpenMP.
+This threading model allows users to create all threads and explicitly give them to
+PETSc or take control back from PETSc. A user can create threadcomms prior to creating
+threads to set up the threadcomm structs and variables, and then later give threads to
+the threadcomms. A join routine can be called to give control of threads to PETSc and a
+return routine can be called to take back control of the threads. When PETSc is given
+control of the threads, a single master thread for each threadcomm will return from the
+join routine and execute the users code while the worker threads will wait in the threadpool
+for jobs to execute. When the return routine is called to return control of the threads to
+the user, the master threads will set all worker threads in a threadcomm to exit the spin
+loop once all jobs have been completed and cause those threads to exit the join routine.
+
+The join routine will return a comm rank, which will be a nonnegative integer for master
+threads. All worker threads will return with a negative integer. An if statement must be
+placed immediately after the join routine to ensure that worker threads do not call the
+previously executed routines twice. This if statement must end immediately before the return
+routine to allow the worker and master threads to reach the same point in the code.
+
+After creating the threads, the user must call PetscThreadInitialize() to initialize thread
+specific PETSc data structures. Before destroying threads, the user must call PetscThreadFinalize()
+to merge any thread specific data to a global data structure and then destroy any thread specific
+data structures. This threading model works with PThreads and OpenMP.
 
 \section{Future Work}
 
-\subsection{Bugs}
+A number of changes are necessary to make the code more effective and reliable before this
+code becomes commonly used. There are also many changes that can be made to improve the
+functionality of this code. Below are some known issues and suggestions for future work.
 
-A number of changes are necessary to make the code more effective and reliable before this code becomes commonly used. There are also plenty of changes that can be made to improve the functionality of this code. Below are some known issues.
+\subsection{Known Issues}
 
-Currently PETSc is not fully threadsafe. Especially when using multiple threadcomms in parallel, the code will work sometimes, but other times the code will fail to run to completion due to a variety of issues. The logging code in particular needs to be modified to be threadsafe, although this may require some significant changes since this code is used throughout many other PETSc routines.
+Currently PETSc is not fully threadsafe. Especially when using multiple threadcomms in
+parallel, the code will work sometimes, but other times the code will fail to run to
+completion due to a variety of issues. The logging code in particular needs to be
+modified to be threadsafe, although this may require some significant changes since
+this code is used throughout many other PETSc routines.
 
-The ex9 threadcomm example with multiple threads and multiple comms in particular is error prone (run with 8+ threads and 8+ threadcomms), although it works at times and at other times fails for a variety of reasons. Known issues include having a selfcomm get a counter attached to it prior to calling the routine to attach a counter, which will often result in errors later in the run. Also in PetscMallocValidate(), an infinite loop can occur where the TRhead linked list ends up with a loop. Issues will occur at times where a variable will be freed multiple times during finalization code. The code will occasionally fail to run in other ways.
+The ex9 threadcomm example with multiple threads and multiple comms in particular is
+error prone (run with 8+ threads and 8+ threadcomms to cause errors to frequently occur),
+although it works at times and at other times fails for a variety of reasons. Known issues
+include having a selfcomm get a counter attached to it prior to calling the routine to
+attach a counter, which will often result in errors later in the run. Also in PetscMallocValidate(),
+an infinite loop can occur where the TRhead linked list ends up with a loop in the list. Issues
+will occur at times where a variable will be freed multiple times during finalization code.
+The code will occasionally fail to run in other ways.
 
-While some parts of the code have been modified to improve thread safety, these sections may need additional work to be fully threadsafe or to provide the same functionality as when not using threads.
+While some parts of the code have been modified to improve thread safety, these sections may
+need additional work to be fully threadsafe or to provide the same functionality as when not
+using threads.
 
 \subsection{Additional Functionality}
 
-Currently the code is tested by creating all threadcomms at the beginning of the simulation. Testing that the code works when threadcomms are created in the middle of a simulation and adding additional functionality would be beneficial. It may potentially be useful to allow users to increase the number of threads in a threadcomm/threadpool during a simulation.
-
-Currently barriers are called by a master thread. Adding functionality that allows a barrier function to be added to thread job queue would allow a threadcomm containing only worker threads, such in the auto threading model, to call barriers after each routine without causing the master thread to wait.
-
-There may be potential to remove the requirement to set the threading model through the command line prior to the simulation and instead allow users to determine during the simulation how they want to use threads. It may be possible to remove the different thread models entirely and instead create new functions or modify current functions to allow users to create threads and use threads in a variety of ways during the simulation.
-
-The Intel Threaded Building Blocks code has a fairly simple implementation at this point. It should be possible to add new functionality for this thread type including modifying it to work for the auto or user threading models.
+Currently the code is tested by creating all threadcomms at the beginning of the simulation.
+Testing that the code works when threadcomms are created in the middle of a simulation and
+adding additional functionality would be beneficial. It may potentially be useful to allow
+users to increase the number of threads in a threadcomm/threadpool during a simulation.
+
+Currently barriers are called by a master thread. Adding functionality that allows a barrier
+function to be added to a thread job queue would allow a threadcomm containing only worker
+threads, such in the auto threading model, to call barriers after each routine without
+causing the master thread to wait.
+
+There may be potential to remove the requirement to set the threading model through the
+command line prior to the simulation and instead allow users to determine during the
+simulation how they want to use threads. It may be possible to remove the different thread
+models entirely and instead create new functions or modify current functions to allow users
+to create threads and use threads in a variety of ways during the simulation.
+
+There are currently four different ways to use thread local global variables. These are
+determined at compile time. Only the option using PETSC\_PTHREAD\_LOCAL has been tested.
+The additional options for creating thread local global variables need to be tested and
+may require additional changes. Ideally it would help to determine which option to use
+at runtime.
+
+The Intel Threaded Building Blocks code has a fairly simple implementation at this point.
+It should be possible to add new functionality for this thread type including modifying it
+to work for the auto and/or user threading models.
+
+Some routines throughout PETSc currently do not have multithreaded kernels. The vector and matrix multiply routines are multithreaded, but most other linear algebra routines are not. These will need to be developed to allow users to fully take advantage of using PETSc with threads.
 
 \bibliographystyle{plain}
 \bibliography{../petsc}

File src/sys/threadcomm/examples/tutorials/ex10.c

 #define __FUNCT__ "main"
 int main(int argc,char **argv)
 {
-  Mat             A;
-  Vec             x, b;
-  PetscErrorCode  ierr;
-  PetscInt        nthreads,n=20,i,j,Ii,J;
-  PetscScalar     v, vnorm;
-  KSP             ksp;
-  PC              pc;
-  MPI_Comm        comm;
+  Mat            A;
+  Vec            x, b;
+  PetscErrorCode ierr;
+  PetscInt       nthreads,n=20,i,j,Ii,J;
+  PetscScalar    v, vnorm;
+  KSP            ksp;
+  PC             pc;
+  MPI_Comm       comm;
 
   ierr = PetscInitialize(&argc,&argv,(char*)0,help);CHKERRQ(ierr);
 

File src/sys/threadcomm/examples/tutorials/ex12.c

 
   /* Split threads evenly among comms */
   ierr = PetscThreadCommSplit(comm,ncomms,PETSC_NULL,PETSC_NULL,&splitcomms);CHKERRQ(ierr);
-  ierr = PetscPrintf(comm,"Created split comms with %d comms\n",ncomms);
+  ierr = PetscPrintf(comm,"Created split comms with %d comms\n",ncomms);CHKERRQ(ierr);
 
   for(i=0; i<ncomms; i++) {
     ierr = PetscThreadCommGetNThreads(splitcomms[i],&nthreads);CHKERRQ(ierr);
   }
 
   ierr = PetscPrintf(comm,"Main thread at barrier after distributing work\n");CHKERRQ(ierr);
-  PetscThreadCommBarrier(comm);
+  ierr = PetscThreadCommBarrier(comm);CHKERRQ(ierr);
 
   /* Output final results */
   for(i=0; i<ncomms; i++) {
     ierr = VecNorm(yvec[i],NORM_1,&vnorm[i]);CHKERRQ(ierr);
-    ierr = PetscPrintf(splitcomms[i],"Splitcomm %d computed vnorm=%lf\n",i,vnorm[i]);
+    ierr = PetscPrintf(splitcomms[i],"Splitcomm %d computed vnorm=%lf\n",i,vnorm[i]);CHKERRQ(ierr);
   }
 
-  PetscThreadCommBarrier(comm);
+  ierr = PetscThreadCommBarrier(comm);CHKERRQ(ierr);
 
   /* Destroy MPI_Comms/threadcomms */
   for(i=0; i<ncomms; i++) {

File src/sys/threadcomm/examples/tutorials/ex7.c

 
   PetscFunctionBegin;
   /* Get data for local work */
-  ierr = VecGetArray(y,&ay);CHKERRCONTINUE(ierr);
-  ierr = VecGetLocalSize(y,&lsize);CHKERRCONTINUE(ierr);
-  ierr = PetscThreadCommGetOwnershipRanges(*comm,lsize,&indices);CHKERRCONTINUE(ierr);
+  ierr = VecGetArray(y,&ay);CHKERRQ(ierr);
+  ierr = VecGetLocalSize(y,&lsize);CHKERRQ(ierr);
+  ierr = PetscThreadCommGetOwnershipRanges(*comm,lsize,&indices);CHKERRQ(ierr);
 
   /* Parallel threaded user code */
   start = indices[trank];
   for(i=start; i<end; i++) ay[i] = ay[i]*ay[i];
 
   /* Restore vector */
-  ierr = VecRestoreArray(y,&ay);CHKERRCONTINUE(ierr);
-  ierr = PetscFree(indices);CHKERRCONTINUE(ierr);
+  ierr = VecRestoreArray(y,&ay);CHKERRQ(ierr);
+  ierr = PetscFree(indices);CHKERRQ(ierr);
   PetscFunctionReturn(0);
 }
 
   ierr = PetscPrintf(comm2,"Comm1 has %d threads, created comm2 with %d threads\n",ntcthreads1,ntcthreads2);CHKERRQ(ierr);
 
   /* Set thread affinities for this threadcomm specifically */
-  PetscMalloc1(nthreads,&affinities);
+  ierr = PetscMalloc1(nthreads,&affinities);CHKERRQ(ierr);
   for(i=0; i<nthreads; i++) affinities[i] = nthreads + i;
   /* Attach threadcomm to PETSC_COMM_WORLD */
   ierr = PetscThreadCommCreateAttach(PETSC_COMM_WORLD,nthreads,affinities);CHKERRQ(ierr);
 
   /* Run single comm test 1 */
   comm = comm1;
-  ierr = PetscPrintf(comm,"\nRunning test with first single comm\n");
+  ierr = PetscPrintf(comm,"\nRunning test with first single comm\n");CHKERRQ(ierr);
   /* Create two vectors on MPIComm/ThreadComm */
   ierr = VecCreate(comm,&x);CHKERRQ(ierr);
   ierr = VecSetSizes(x,PETSC_DECIDE,n);CHKERRQ(ierr);
 
   /* Run PETSc code */
   for(i=0; i<n; i++) {
-    VecSetValue(x,i,i*1.0,INSERT_VALUES);
-    VecSetValue(y,i,i*2.0,INSERT_VALUES);
+    ierr = VecSetValue(x,i,i*1.0,INSERT_VALUES);CHKERRQ(ierr);
+    ierr = VecSetValue(y,i,i*2.0,INSERT_VALUES);CHKERRQ(ierr);
   }
   ierr = VecAXPY(y,alpha,x);CHKERRQ(ierr);
   ierr = VecNorm(y,NORM_2,&vnorm);CHKERRQ(ierr);
 
   /* Run single comm test 2 */
   comm = comm2;
-  ierr = PetscPrintf(comm,"\nRunning test with second single comm\n");
+  ierr = PetscPrintf(comm,"\nRunning test with second single comm\n");CHKERRQ(ierr);
   /* Create two vectors on MPIComm/ThreadComm */
   ierr = VecCreate(comm,&x);CHKERRQ(ierr);
   ierr = VecSetSizes(x,PETSC_DECIDE,n);CHKERRQ(ierr);
   comm_a = comm2;
   comm_b = comm1;
 
-  ierr = PetscPrintf(comm_a,"\nTesting computations with vectors on different comms\n");
+  ierr = PetscPrintf(comm_a,"\nTesting computations with vectors on different comms\n");CHKERRQ(ierr);
   ierr = VecCreate(comm_a,&x);CHKERRQ(ierr);
   ierr = VecSetSizes(x,PETSC_DECIDE,n);CHKERRQ(ierr);
   ierr = VecSetFromOptions(x);CHKERRQ(ierr);

File src/sys/threadcomm/examples/tutorials/ex9.c

 
   /* Create multiple threadcomms */
   ierr = PetscPrintf(comm,"Creating multcomm with %d comms\n",ncomms);CHKERRQ(ierr);
-  ierr = PetscThreadCommCreateMultiple(PETSC_COMM_WORLD,ncomms,nthreads,PETSC_NULL,PETSC_NULL,&multcomms);
+  ierr = PetscThreadCommCreateMultiple(PETSC_COMM_WORLD,ncomms,nthreads,PETSC_NULL,PETSC_NULL,&multcomms);CHKERRQ(ierr);
 
   #pragma omp parallel num_threads(nthreads) default(shared) private(ierr)
   {
     #pragma omp barrier
 
     /* User gives threads to PETSc for threaded shared comm PETSc work */
-    ierr = PetscPrintf(comm,"\nRunning shared comm test\n");
+    ierr = PetscPrintf(comm,"\nRunning shared comm test\n");CHKERRCONTINUE(ierr);
     ierr = PetscThreadCommJoinMultComms(&shcomm,1,trank,&commrank);CHKERRCONTINUE(ierr);
     if(commrank>=0) {
       ierr = VecCreateMPI(shcomm,PETSC_DECIDE,n,&a);CHKERRCONTINUE(ierr);
     }
     ierr = PetscThreadCommReturnMultComms(&shcomm,1,trank,&commrank);CHKERRCONTINUE(ierr);
 
-     #pragma omp barrier
+    #pragma omp barrier
 
     /* User gives threads to PETSc for threaded single comm PETSc work */
-    ierr = PetscPrintf(comm,"\nRunning single comm test\n");
+    ierr = PetscPrintf(comm,"\nRunning single comm test\n");CHKERRCONTINUE(ierr);
     ierr = PetscThreadCommJoinMultComms(&comm,1,trank,&commrank);CHKERRCONTINUE(ierr);
     if(commrank>=0) {
       ierr = VecCreateMPI(comm,PETSC_DECIDE,n,&a);CHKERRCONTINUE(ierr);

File src/sys/threadcomm/impls/nothread/nothread.c

   PetscErrorCode ierr;
 
   PetscFunctionBegin;
-  ThreadType = THREAD_TYPE_NOTHREAD;
-  ierr = PetscNew(&PetscLocks);CHKERRQ(ierr);
+  ThreadType                = THREAD_TYPE_NOTHREAD;
+  ierr                      = PetscNew(&PetscLocks);CHKERRQ(ierr);
   PetscLocks->trmalloc_lock = PETSC_NULL;
-  PetscThreadLockAcquire = PetscThreadLockAcquire_NoThread;
-  PetscThreadLockRelease = PetscThreadLockRelease_NoThread;
-  PetscThreadLockCreate  = PetscThreadLockCreate_NoThread;
-  PetscThreadLockDestroy = PetscThreadLockDestroy_NoThread;
+  PetscThreadLockAcquire    = PetscThreadLockAcquire_NoThread;
+  PetscThreadLockRelease    = PetscThreadLockRelease_NoThread;
+  PetscThreadLockCreate     = PetscThreadLockCreate_NoThread;
+  PetscThreadLockDestroy    = PetscThreadLockDestroy_NoThread;
   PetscFunctionReturn(0);
 }
 
   PetscErrorCode ierr;
 
   PetscFunctionBegin;
-  ierr = PetscStrcpy(pool->type,NOTHREAD);CHKERRQ(ierr);
+  ierr             = PetscStrcpy(pool->type,NOTHREAD);CHKERRQ(ierr);
   pool->threadtype = THREAD_TYPE_NOTHREAD;
   PetscFunctionReturn(0);
 }

File src/sys/threadcomm/impls/openmp/tcopenmp.c

   PetscFunctionBegin;
   if (pool->model == THREAD_MODEL_AUTO) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_PLIB,"Unable to use auto thread model with OpenMP. Use loop or user model with OpenMP");
 
-  ierr = PetscStrcpy(pool->type,OPENMP);CHKERRQ(ierr);
-  pool->threadtype = THREAD_TYPE_OPENMP;
+  ierr                     = PetscStrcpy(pool->type,OPENMP);CHKERRQ(ierr);
+  pool->threadtype         = THREAD_TYPE_OPENMP;
   pool->ops->setaffinities = PetscThreadCommSetAffinity_OpenMP;
-  pool->ops->pooldestroy = PetscThreadPoolDestroy_OpenMP;
+  pool->ops->pooldestroy   = PetscThreadPoolDestroy_OpenMP;
   if (pool->model == THREAD_MODEL_LOOP) {
     /* Initialize each thread */
     #pragma omp parallel num_threads(pool->npoolthreads)
   PetscErrorCode         ierr;
 
   PetscFunctionBegin;
-  ierr = PetscNew(&ptcomm);CHKERRQ(ierr);
+  ierr                    = PetscNew(&ptcomm);CHKERRQ(ierr);
   ptcomm->barrier_threads = 0;
-  ptcomm->wait_inc = PETSC_TRUE;
-  ptcomm->wait_dec = PETSC_TRUE;
+  ptcomm->wait_inc        = PETSC_TRUE;
+  ptcomm->wait_dec        = PETSC_TRUE;
 
   tcomm->data = (void*)ptcomm;
   if (tcomm->model == THREAD_MODEL_LOOP) {
   PetscFunctionBegin;
   if (tcomm->ismainworker) {
     job->job_status = THREAD_JOB_RECEIVED;
-    ierr = PetscRunKernel(0,job->nargs,job);CHKERRCONTINUE(ierr);
+    ierr = PetscRunKernel(0,job->nargs,job);CHKERRQ(ierr);
     job->job_status = THREAD_JOB_COMPLETED;
     jobqueue = tcomm->commthreads[tcomm->lleader]->jobqueue;
     jobqueue->current_job_index = (jobqueue->current_job_index+1)%tcomm->nkernels;
     jobqueue->completed_jobs_ctr++;
   }
   if (tcomm->syncafter) {
-    ierr = PetscThreadCommJobBarrier(tcomm);CHKERRCONTINUE(ierr);
+    ierr = PetscThreadCommJobBarrier(tcomm);CHKERRQ(ierr);
   }
   PetscFunctionReturn(0);
 }
 PetscErrorCode PetscThreadLockCreate_OpenMP(void **lock)
 {
   PetscThreadLock_OpenMP omplock;
-  PetscErrorCode ierr;
+  PetscErrorCode         ierr;
 
   PetscFunctionBegin;
   ierr = PetscNew(&omplock);CHKERRQ(ierr);
 PetscErrorCode PetscThreadLockDestroy_OpenMP(void **lock)
 {
   PetscThreadLock_OpenMP ptlock = (PetscThreadLock_OpenMP)lock;
-  PetscErrorCode ierr;
+  PetscErrorCode         ierr;
 
   PetscFunctionBegin;
   omp_destroy_lock(ptlock);

File src/sys/threadcomm/impls/pthread/tcpthread.c

   PetscErrorCode      ierr;
 
   PetscFunctionBegin;
-  ierr = PetscNew(&ptcomm);
+  ierr         = PetscNew(&ptcomm);
   thread->data = (void*)ptcomm;
   PetscFunctionReturn(0);
 }
 PetscErrorCode PetscThreadCommBarrier_PThread(PetscThreadComm tcomm)
 {
   PetscThreadComm_PThread ptcomm = (PetscThreadComm_PThread)tcomm->data;
-  PetscErrorCode ierr;
+  PetscErrorCode          ierr;
 
   PetscFunctionBegin;
   ierr = PetscLogEventBegin(ThreadComm_Barrier,0,0,0,0);CHKERRQ(ierr);
 PetscErrorCode PetscThreadLockCreate_PThread(void **lock)
 {
   PetscThreadLock_PThread ptlock;
-  PetscErrorCode ierr;
+  PetscErrorCode          ierr;
 
   PetscFunctionBegin;
-  ierr = PetscNew(&ptlock);CHKERRQ(ierr);
-  ierr = pthread_mutex_init(ptlock,PETSC_NULL);CHKERRQ(ierr);
+  ierr  = PetscNew(&ptlock);CHKERRQ(ierr);
+  ierr  = pthread_mutex_init(ptlock,PETSC_NULL);CHKERRQ(ierr);
   *lock = (void*)ptlock;
   PetscFunctionReturn(0);
 }
 PetscErrorCode PetscThreadLockDestroy_PThread(void **lock)
 {
   PetscThreadLock_PThread ptlock = (PetscThreadLock_PThread)lock;
-  PetscErrorCode ierr;
+  PetscErrorCode          ierr;
 
   PetscFunctionBegin;
   pthread_mutex_destroy(ptlock);

File src/sys/threadcomm/impls/tbb/tctbb.cxx

 PetscErrorCode PetscThreadInit_TBB()
 {
   PetscFunctionBegin;
-  ThreadType = THREAD_TYPE_TBB;
+  ThreadType             = THREAD_TYPE_TBB;
   PetscThreadLockCreate  = PetscThreadLockCreate_TBB;
   PetscThreadLockDestroy = PetscThreadLockDestroy_TBB;
   PetscThreadLockAcquire = PetscThreadLockAcquire_TBB;
   PetscFunctionBegin;
   if (pool->model == THREAD_MODEL_AUTO || pool->model == THREAD_MODEL_USER) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_PLIB,"Unable to use auto or user thread model with TBB. Use loop model with TBB");
 
-  ierr = PetscStrcpy(pool->type,TBB);CHKERRQ(ierr);
+  ierr                     = PetscStrcpy(pool->type,TBB);CHKERRQ(ierr);
   pool->threadtype         = THREAD_TYPE_TBB;
   pool->ops->createthread  = PetscThreadCreate_TBB;
   pool->ops->startthreads  = PetscThreadCommInitialize_TBB;
 
 #undef __FUNCT__
 #define __FUNCT__ "PetscThreadCreate_TBB"
+/*
+   PetscThreadCreate_TBB
+*/
 PetscErrorCode PetscThreadCreate_TBB(PetscThread thread)
 {
   PetscFunctionBegin;
 
 #undef __FUNCT__
 #define __FUNCT__ "PetscThreadCommInitialize_TBB"
+/*
+   PetscThreadCommInitialize_TBB
+*/
 PetscErrorCode PetscThreadCommInitialize_TBB(PetscThreadPool pool)
 {
   PetscFunctionBegin;
 
 #undef __FUNCT__
 #define __FUNCT__ "PetscThreadCommSetAffinity_TBB"
+/*
+   PetscThreadCommSetAffinity_TBB
+*/
 PetscErrorCode PetscThreadCommSetAffinity_TBB(PetscThreadPool pool, PetscThread thread)
 {
   PetscFunctionBegin;
 
 #undef __FUNCT__
 #define __FUNCT__ "PetscThreadPoolDestroy_TBB"
+/*
+   PetscThreadPoolDestroy_TBB
+*/
 PetscErrorCode PetscThreadPoolDestroy_TBB(PetscThreadPool pool)
 {
   PetscFunctionBegin;
 #define __FUNCT__ "PetscThreadLockCreate_TBB"
 /*
    PetscThreadLockCreate_TBB
-
-   Not Collective
-
-   Level: developer
-
 */
 PetscErrorCode PetscThreadLockCreate_TBB(void** lock)
 {
 #define __FUNCT__ "PetscThreadLockDestroy_TBB"
 /*
    PetscThreadLockDestroy_TBB
-
-   Not Collective
-
-   Level: developer
-
 */
 PetscErrorCode PetscThreadLockDestroy_TBB(void** lock)
 {
 #define __FUNCT__ "PetscThreadLockAcquire_TBB"
 /*
    PetscThreadLockAcquire_TBB
-
-   Not Collective
-
-   Level: developer
-
 */
 PetscErrorCode PetscThreadLockAcquire_TBB(void *lock)
 {
 #define __FUNCT__ "PetscThreadLockRelease_TBB"
 /*
    PetscThreadLockRelease_TBB
-
-   Not Collective
-
-   Level: developer
-
 */
 PetscErrorCode PetscThreadLockRelease_TBB(void *lock)
 {

File src/sys/threadcomm/interface/dlregisthreadcomm.c

 #include <petsc-private/threadcommimpl.h>
 
 /* Variables to track package registration/initialization */
-static PetscBool PetscThreadCommPackageInitialized = PETSC_FALSE;
+static PetscBool       PetscThreadCommPackageInitialized = PETSC_FALSE;
 PETSC_EXTERN PetscBool PetscThreadCommRegisterAllModelsCalled;
 PETSC_EXTERN PetscBool PetscThreadCommRegisterAllTypesCalled;
 

File src/sys/threadcomm/interface/threadcomm.c

 PetscFunctionList PetscThreadCommTypeList  = PETSC_NULL;
 PetscFunctionList PetscThreadCommModelList = PETSC_NULL;
 
-PETSC_EXTERN PetscInt   N_CORES;
+PETSC_EXTERN PetscInt N_CORES;
 
 /* Logging support */
-PetscLogEvent ThreadComm_RunKernel, ThreadComm_Barrier;
+PetscLogEvent ThreadComm_RunKernel,ThreadComm_Barrier;
 
 #undef __FUNCT__
 #define __FUNCT__ "PetscThreadCommGetComm"
 @*/
 PetscErrorCode PetscThreadCommGetComm(MPI_Comm comm,PetscThreadComm *tcomm)
 {
-  PetscErrorCode  ierr;
-  PetscMPIInt     flg;
-  void            *ptr;
+  PetscErrorCode ierr;
+  PetscMPIInt    flg;
+  void           *ptr;
 
   PetscFunctionBegin;
   ierr = MPI_Attr_get(comm,Petsc_ThreadComm_keyval,(PetscThreadComm*)&ptr,&flg);CHKERRQ(ierr);
 PetscErrorCode PetscThreadCommJobBarrier(PetscThreadComm tcomm)
 {
   PetscInt                active_threads=0,i,job_status;
-  PetscBool               wait          =PETSC_TRUE;
+  PetscBool               wait=PETSC_TRUE;
   PetscThreadCommJobQueue jobqueue;
   PetscThreadCommJobCtx   job;
   PetscErrorCode          ierr;
   ierr = PetscThreadCommInitialize(nthreads,pranks,tcomm);CHKERRQ(ierr);
   /* Attach ThreadComm to MPI_Comm */
   ierr = PetscThreadCommAttach(comm,tcomm);CHKERRQ(ierr);
-  ierr = PetscFree(pranks);
+  ierr = PetscFree(pranks);CHKERRQ(ierr);
   PetscFunctionReturn(0);
 }
 
   PetscErrorCode  ierr;
   PetscInt        Q,R,*trstarts_out,nloc,i;
   PetscBool       S;
-  PetscThreadComm tcomm = PETSC_NULL;
+  PetscThreadComm tcomm=PETSC_NULL;
 
   PetscFunctionBegin;
   ierr            = PetscThreadCommGetComm(comm,&tcomm);CHKERRQ(ierr);
 
   PetscFunctionBegin;
 #if defined(PETSC_HAVE_SCHED_CPU_SET_T)
-  ierr = PetscThreadPoolSetAffinity(pool,&cpuset,thread->affinity,&set);
+  ierr = PetscThreadPoolSetAffinity(pool,&cpuset,thread->affinity,&set);CHKERRQ(ierr);
   if (set) sched_setaffinity(0,sizeof(cpu_set_t),&cpuset);
 #endif
   PetscFunctionReturn(0);
   if (trank >= 0 && trank < tcomm->ncommthreads) {
 
     /* Make sure all threads have reached this routine */
-    ierr = (*tcomm->ops->barrier)(tcomm);
+    ierr = (*tcomm->ops->barrier)(tcomm);CHKERRQ(ierr);
 
     /* Initialize thread and join threadpool if a worker thread */
     if (trank == tcomm->lleader) {
   if (comm_index >= 0) {
 
     /* Make sure all threads have reached this routine */
-    ierr = (*tcomm[comm_index]->ops->barrier)(tcomm[comm_index]);
+    ierr = (*tcomm[comm_index]->ops->barrier)(tcomm[comm_index]);CHKERRQ(ierr);
 
     /* Initialize thread and join threadpool if a worker thread */
     if (local_index == tcomm[comm_index]->lleader) {
     }
 
     /* Make sure all threads have initialized threadcomm */
-    ierr = (*tcomm[comm_index]->ops->barrier)(tcomm[comm_index]);
+    ierr = (*tcomm[comm_index]->ops->barrier)(tcomm[comm_index]);CHKERRQ(ierr);
 
     /* Set affinity */
     if(tcomm[comm_index]->threadtype == THREAD_TYPE_OPENMP) {
-      ierr = PetscThreadCommSetThreadAffinity(tcomm[comm_index]->pool,tcomm[comm_index]->commthreads[local_index]);CHKERRCONTINUE(ierr);
+      ierr = PetscThreadCommSetThreadAffinity(tcomm[comm_index]->pool,tcomm[comm_index]->commthreads[local_index]);CHKERRQ(ierr);
     }
 
     if (*commrank == -1) {
     }
 
     /* Make sure all worker threads have terminated successfully and reached this barrier */
-    ierr = (*tcomm->ops->barrier)(tcomm);
+    ierr = (*tcomm->ops->barrier)(tcomm);CHKERRQ(ierr);
   }
   *commrank = -1;
   PetscFunctionReturn(0);
 PetscErrorCode PetscThreadCommReturnMultComms(MPI_Comm *comm,PetscInt ncomms,PetscInt trank,PetscInt *commrank)
 {
   PetscThreadComm *tcomm;
-  PetscInt        i, j, comm_index=-1, startthread=0;
+  PetscInt        i,j,comm_index=-1,startthread=0;
   PetscErrorCode  ierr;
 
   PetscFunctionBegin;
     /* Master threads terminate worker threads */
     if (*commrank >= 0) {
       /* Make sure each thread has finished its work */
-      ierr = PetscThreadCommJobBarrier(tcomm[comm_index]);
+      ierr = PetscThreadCommJobBarrier(tcomm[comm_index]);CHKERRQ(ierr);
       for (i=0; i<tcomm[comm_index]->ncommthreads; i++) {
         tcomm[comm_index]->commthreads[i]->status = THREAD_TERMINATE;
       }
     }
 
     /* Make sure all worker threads have terminated successfully and reached this barrier */
-    ierr = (*tcomm[comm_index]->ops->barrier)(tcomm[comm_index]);
+    ierr = (*tcomm[comm_index]->ops->barrier)(tcomm[comm_index]);CHKERRQ(ierr);
   }
   *commrank = -1;
   ierr = PetscFree(tcomm);CHKERRQ(ierr);

File src/sys/threadcomm/interface/threadpool.c

       job = &jobqueue->jobs[jobqueue->current_job_index];
       pool->poolthreads[trank]->jobdata = job;
       /* Do own job */
-      PetscRunKernel(job->commrank,thread->jobdata->nargs,thread->jobdata);
+      ierr = PetscRunKernel(job->commrank,thread->jobdata->nargs,thread->jobdata);CHKERRCONTINUE(ierr);
       /* Post job completed status */
       job->job_status = THREAD_JOB_COMPLETED;
       jobqueue->current_job_index = (jobqueue->current_job_index+1)%pool->nkernels;

File src/sys/threadcomm/interface/threads.c

 
   /* Create thread stack */
   ierr = PetscThreadCommStackCreate();CHKERRQ(ierr);
-
   /* Setup TRMalloc */
   ierr = PetscTrMallocInitialize();CHKERRQ(ierr);
-
   PetscThreadInit = 1;
   PetscFunctionReturn(0);
 }
 
   /* Add code to destroy TRMalloc/merged with main trmalloc data */
   ierr = PetscTrMallocFinalize();CHKERRQ(ierr);
-
   /* Destroy thread stack */
   ierr = PetscThreadCommStackDestroy();CHKERRQ(ierr);
-
   PetscThreadInit = 0;
   PetscFunctionReturn(0);
 }
   PetscFunctionBegin;
   if (PetscLocks) PetscFunctionReturn(0);
   ierr = PetscNew(&PetscLocks);CHKERRQ(ierr);
-  ierr = (*PetscThreadLockCreate)(&PetscLocks->trmalloc_lock);
-  ierr = (*PetscThreadLockCreate)(&PetscLocks->vec_lock);
+  ierr = (*PetscThreadLockCreate)(&PetscLocks->trmalloc_lock);CHKERRQ(ierr);
+  ierr = (*PetscThreadLockCreate)(&PetscLocks->vec_lock);CHKERRQ(ierr);
   PetscFunctionReturn(0);
 }