PROPOSALS Craypat-lite: OPENACC
Issue #21
new
MPI+OPENACC (PizDaint)
Get the src
- ssh daint
- cd $SCRATCH
- git clone https://github.com/eth-cscs/proposals.git proposals.git
Cloning into 'proposals.git'...
remote: Counting objects: 339, done.
remote: Total 339 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (339/339), 300.16 KiB | 234 KiB/s, done.
Resolving deltas: 100% (139/139), done.
- cd proposals.git/vihps/SAXPY_OPENACC/
Setup
- module load PrgEnv-cray
- module load craype-accel-nvidia35
- module use /project/csstaff/proposals
- module load perflite/622openacc
- echo CRAYPAT_LITE=$CRAYPAT_LITE
CRAYPAT_LITE=gpu
- module list
Currently Loaded Modulefiles:
1) modules/3.2.10.2
2) nodestat/2.2-1.0502.53712.3.109.ari
3) sdb/1.0-1.0502.55976.5.27.ari
4) alps/5.2.1-2.0502.9041.11.6.ari
5) lustre-cray_ari_s/2.5_3.0.101_0.31.1_1.0502.8394.10.1-1.0502.17198.8.51
6) udreg/2.3.2-1.0502.9275.1.12.ari
7) ugni/5.0-1.0502.9685.4.24.ari
8) gni-headers/3.0-1.0502.9684.5.2.ari
9) dmapp/7.0.1-1.0502.9501.5.219.ari
10) xpmem/0.1-2.0502.55507.3.2.ari
11) hss-llm/7.2.0
12) Base-opts/1.0.2-1.0502.53325.1.2.ari
13) craype-network-aries
14) craype/2.2.1
15) cce/8.3.7
16) totalview-support/1.1.4
17) totalview/8.11.0
18) cray-libsci/13.0.1
19) pmi/5.0.6-1.0000.10439.140.2.ari
20) rca/1.0.0-2.0502.53711.3.127.ari
21) atp/1.7.5
22) PrgEnv-cray/5.2.40
23) craype-sandybridge
24) slurm
25) cray-mpich/7.1.1
26) ddt/4.3rc7
27) linux/jg
28) cray-libsci_acc/3.0.2
29) cudatoolkit/5.5.22-1.0502.7944.3.1
30) craype-accel-nvidia35
31) perflite/622openacc
Compile
- make clean
- make FLAGS=-hacc
cc -hacc -c mpiacc_c.c -o CRAY_mpiacc_c.o
cc -hacc CRAY_mpiacc_c.o -o CRAY.SANTIS
INFO: creating the CrayPat-instrumented executable 'CRAY.SANTIS' (gpu) ...OK
INFO: A maximum of 17 functions from group 'aio' will be traced.
INFO: A maximum of 28 functions from group 'ffio' will be traced.
INFO: A maximum of 107 functions from group 'io' will be traced.
INFO: A maximum of 699 functions from group 'mpi' will be traced.
INFO: A maximum of 43 functions from group 'oacc' will be traced.
INFO: A maximum of 33 functions from group 'omp' will be traced.
CRAY / SANTIS / openacc executable ready
Run
- sbatch runme.slurm
Submitted batch job 2383
Reports
- cat o_
#################################################################
# #
# CrayPat-lite Performance Statistics #
# #
#################################################################
CrayPat/X: Version 6.2.2 Revision 13378 (xf 13240) 11/20/14 14:32:58
Experiment: lite lite/gpu
Number of PEs (MPI ranks): 2
Numbers of PEs per Node: 1 PE on each of 2 Nodes
Numbers of Threads per PE: 1
Number of Cores per Socket: 8
Execution start time: Wed Jan 28 15:53:33 2015
System name and speed: santis01 2601 MHz
Avg Process Time: 0.114 secs
High Memory: 99.582 MBytes 49.791 MBytes per PE
I/O Write Rate: 1.701 MBytes/sec
Table 1: Profile by Function Group and Function
Time% | Time | Imb. | Imb. | Calls |Group
| | Time | Time% | | Function
| | | | | PE=HIDE
100.0% | 0.015212 | -- | -- | 456.5 |Total
|------------------------------------------------------------------------
| 93.7% | 0.014248 | -- | -- | 3.0 |MPI_SYNC
||-----------------------------------------------------------------------
|| 93.1% | 0.014169 | 0.014149 | 99.9% | 1.0 |MPI_Init(sync)
||=======================================================================
| 5.4% | 0.000814 | -- | -- | 8.0 |USER
||-----------------------------------------------------------------------
|| 3.1% | 0.000472 | 0.000024 | 9.5% | 2.0 |run.ACC_COPY@li.32
|| 1.7% | 0.000257 | 0.000003 | 2.1% | 1.0 |run.ACC_ASYNC_KERNEL@li.32
|========================================================================
Table 2: Accelerator Table by Function (top 10 functions shown)
Host | Host | Acc | Acc Copy | Acc Copy | Events |Function=[max10]
Time% | Time | Time | In | Out | | PE=HIDE
| | | (MBytes) | (MBytes) | | Thread=HIDE
100.0% | 0.001 | 0.000 | 0.002 | 0.002 | 5 |Total
|-----------------------------------------------------------------------------
| 61.9% | 0.000 | 0.000 | 0.002 | 0.002 | 2 |run.ACC_COPY@li.32
| 33.7% | 0.000 | 0.000 | -- | -- | 1 |run.ACC_ASYNC_KERNEL@li.32
| 3.9% | 0.000 | -- | -- | -- | 1 |run.ACC_SYNC_WAIT@li.32
|=============================================================================
Program invocation: CRAY.SANTIS 256
For a complete report with expanded tables and notes, run:
pat_report /scratch/santis/piccinal/proposals.git/vihps/SAXPY_OPENACC/CRAY.SANTIS+24123-14t.ap2
For help identifying callers of particular functions:
pat_report -O callers+src /scratch/santis/piccinal/proposals.git/vihps/SAXPY_OPENACC/CRAY.SANTIS+24123-14t.ap2
To see the entire call tree:
pat_report -O calltree+src /scratch/santis/piccinal/proposals.git/vihps/SAXPY_OPENACC/CRAY.SANTIS+24123-14t.ap2
For interactive, graphical performance analysis, run:
app2 /scratch/santis/piccinal/proposals.git/vihps/SAXPY_OPENACC/CRAY.SANTIS+24123-14t.ap2
================ End of CrayPat-lite output ==========================