Wiki

Clone wiki

parsec / PaRSEC on Intel Xeon Phi Native Mode

Running PaRSEC on Intel Xeon Phi is almost the same as running on other platforms. But there are some differences in the following sections:

Install required softwares

To compile PaRSEC/DPLASMA in Intel Xeon Phi,you will need the following software of Intel Xeon Phi version:

  • a BLAS library excitable on Intel Xeon Phi, usually is MKL. When using MKL, please use multi-thread version since there are 4 hyper threads per physical core.
  • PLASMA. To get a Intel Xeon Phi version, please use x86_64-k1om-linux-gcc or icc with flags -mmic as C compiler, and x86_64-k1om-linux-gfortran or ifort with flags -mmic as Fortran compiler (x86_64-k1om-linux-gcc and x86_64-k1om-linux-gfortran are usually in /usr/linux-k1om-4.7/bin of your system). Regular version of PLASMA use sequential MKL, but the Xeon Phi version requires multi-thread MKL. Since PLASMA requires LAPACK and LAPACKE, make sure you LAPACK and LAPACKE are also compiled with the above compilers.

Configuring and compiling PaRSEC for Intel Xeon Phi

PaRSEC is able to use cross platform compilation, so compiling is always done in CPU, then executable program is copied to Intel Xeon Phi and run

First, update to the branch native-mic-all-about-data. This is a branch especially for Intel Xeon Phi, inherited from all-about-data. There is an example invocation of cmake contrib/platforms/config.mic. Please start from this file and tweak it for your system. The same as compiling PLASMA, set compilers as follows:

CC=${CC:="icc -mmic"}
CXX=${CXX:="icpc -mmic"}
FC=${FC:="ifort -mmic"}

or

CC=${CC:="x86_64-k1om-linux-gcc"}
CXX=${CXX:="x86_64-k1om-linux-g++"}
FC=${FC:="x86_64-k1om-linux-gfortran"}

Once configuration is done, type ccmake . in your PaRSEC root folder. Please double check PARSEC_GPU_WITH_CUDA is set to OFF, -openmp is in CFLAGS, CXXFLAGS and FFLAGS. When you have set all the options you want in ccmake, type 'c' to configure again, and 'g' to generate the files. If you entered wrong values in some fields, ccmake will complain at 'c' time. If no changes are made, type 'q' to quit without configure again.

If the configuration was good, compilation should be as simple and fancy as 'make'.

Running DPLASMA on Intel Xeon Phi

All testing programs are compiled in dplasma/testing. Copy your executable testing file to Intel Xeon Phi. Then run it with the following command: env MKL_NUM_THREADS=4 MKL_DYNAMIC=false OMP_DYNAMIC=false ./testing_dgemm -N 12000 -t 480 -c 60, -c 60 means using 60 cores. Usually, each PaRSEC thread manages 1 physical core. Since there are 4 hyper thread per core, please use MKL_NUM_THREADS=4

Intel Xeon Phi contains 61 physical cores (1 is reserved for OS, so 60 cores are available for users), so it requires problems to have enough parallelism to get really good performance. Therefore, we introduce virtual core. Each virtual core contains multiple physical cores, and we let each PaRSEC thread manages each virtual core. Inside virtual core is taken care of by multi-thread BLAS. So you can run testing like this: env MKL_NUM_THREADS=4*k MKL_DYNAMIC=false OMP_DYNAMIC=false ./testing_dgemm -N 12000 -t 480 -c 240/4k k is the number of physical cores per virtual core. You can tune k to get a good performance.

Updated