OpenMP enabled PETSc
OpenMP thread-level parallelisation has bee added to PETSc Vec and Mat classes, which define the kernel of the majority of the computation for CSR and Block-CSR formats.
Task-based sparse Matrix-Vector Multiplication (spMVM)
The current version uses task-based spMVM to overlap MPI communication with local computation. In addition, a non-zero-based load balancing scheme is available that balances the workload between threads. This scheme is activated with the option "-matmult_nz_balance"; otherwise a row-based thread partitioning is used.
- When running a single MPI process with multiple threads thread 0 will not act as a worker thread during MatMult. For scaling with a single MPI process this means that only t-1 threads are actively sharing work when OMP_NUM_THREADS=t.
- A purely vector-based version of petsc-3.3-omp is available here.