PROPOSALS Craypat-lite: OPENACC

Issue #21 new
jg piccinali repo owner created an issue

MPI+OPENACC (PizDaint)

Get the src

Cloning into 'proposals.git'...
remote: Counting objects: 339, done.
remote: Total 339 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (339/339), 300.16 KiB | 234 KiB/s, done.
Resolving deltas: 100% (139/139), done.
  • cd proposals.git/vihps/SAXPY_OPENACC/

Setup

  • module load PrgEnv-cray
  • module load craype-accel-nvidia35
  • module use /project/csstaff/proposals
  • module load perflite/622openacc
  • echo CRAYPAT_LITE=$CRAYPAT_LITE
CRAYPAT_LITE=gpu
  • module list
Currently Loaded Modulefiles:
  1) modules/3.2.10.2
  2) nodestat/2.2-1.0502.53712.3.109.ari
  3) sdb/1.0-1.0502.55976.5.27.ari
  4) alps/5.2.1-2.0502.9041.11.6.ari
  5) lustre-cray_ari_s/2.5_3.0.101_0.31.1_1.0502.8394.10.1-1.0502.17198.8.51
  6) udreg/2.3.2-1.0502.9275.1.12.ari
  7) ugni/5.0-1.0502.9685.4.24.ari
  8) gni-headers/3.0-1.0502.9684.5.2.ari
  9) dmapp/7.0.1-1.0502.9501.5.219.ari
 10) xpmem/0.1-2.0502.55507.3.2.ari
 11) hss-llm/7.2.0
 12) Base-opts/1.0.2-1.0502.53325.1.2.ari
 13) craype-network-aries
 14) craype/2.2.1
 15) cce/8.3.7
 16) totalview-support/1.1.4
 17) totalview/8.11.0
 18) cray-libsci/13.0.1
 19) pmi/5.0.6-1.0000.10439.140.2.ari
 20) rca/1.0.0-2.0502.53711.3.127.ari
 21) atp/1.7.5
 22) PrgEnv-cray/5.2.40
 23) craype-sandybridge
 24) slurm
 25) cray-mpich/7.1.1
 26) ddt/4.3rc7
 27) linux/jg
 28) cray-libsci_acc/3.0.2
 29) cudatoolkit/5.5.22-1.0502.7944.3.1
 30) craype-accel-nvidia35
 31) perflite/622openacc

Compile

  • make clean
  • make FLAGS=-hacc
cc -hacc  -c mpiacc_c.c -o CRAY_mpiacc_c.o
cc -hacc CRAY_mpiacc_c.o  -o CRAY.SANTIS

INFO: creating the CrayPat-instrumented executable 'CRAY.SANTIS' (gpu) ...OK
INFO: A maximum of 17 functions from group 'aio' will be traced.
INFO: A maximum of 28 functions from group 'ffio' will be traced.
INFO: A maximum of 107 functions from group 'io' will be traced.
INFO: A maximum of 699 functions from group 'mpi' will be traced.
INFO: A maximum of 43 functions from group 'oacc' will be traced.
INFO: A maximum of 33 functions from group 'omp' will be traced.
CRAY / SANTIS / openacc executable ready

Run

  • sbatch runme.slurm
Submitted batch job 2383

Reports

  • cat o_
#################################################################
#                                                               #
#            CrayPat-lite Performance Statistics                #
#                                                               #
#################################################################

CrayPat/X:  Version 6.2.2 Revision 13378 (xf 13240)  11/20/14 14:32:58
Experiment:                  lite  lite/gpu     
Number of PEs (MPI ranks):      2
Numbers of PEs per Node:        1  PE on each of  2  Nodes
Numbers of Threads per PE:      1
Number of Cores per Socket:     8
Execution start time:  Wed Jan 28 15:53:33 2015
System name and speed:  santis01 2601 MHz

Avg Process Time:  0.114 secs              
High Memory:      99.582 MBytes     49.791 MBytes per PE
I/O Write Rate:    1.701 MBytes/sec        

Table 1:  Profile by Function Group and Function

  Time% |     Time |     Imb. |  Imb. | Calls |Group
        |          |     Time | Time% |       | Function
        |          |          |       |       |  PE=HIDE

 100.0% | 0.015212 |       -- |    -- | 456.5 |Total
|------------------------------------------------------------------------
|  93.7% | 0.014248 |       -- |    -- |   3.0 |MPI_SYNC
||-----------------------------------------------------------------------
||  93.1% | 0.014169 | 0.014149 | 99.9% |   1.0 |MPI_Init(sync)
||=======================================================================
|   5.4% | 0.000814 |       -- |    -- |   8.0 |USER
||-----------------------------------------------------------------------
||   3.1% | 0.000472 | 0.000024 |  9.5% |   2.0 |run.ACC_COPY@li.32
||   1.7% | 0.000257 | 0.000003 |  2.1% |   1.0 |run.ACC_ASYNC_KERNEL@li.32
|========================================================================

Table 2:  Accelerator Table by Function (top 10 functions shown)

   Host |  Host |   Acc | Acc Copy | Acc Copy | Events |Function=[max10]
  Time% |  Time |  Time |       In |      Out |        | PE=HIDE
        |       |       | (MBytes) | (MBytes) |        |  Thread=HIDE

 100.0% | 0.001 | 0.000 |    0.002 |    0.002 |      5 |Total
|-----------------------------------------------------------------------------
|  61.9% | 0.000 | 0.000 |    0.002 |    0.002 |      2 |run.ACC_COPY@li.32
|  33.7% | 0.000 | 0.000 |       -- |       -- |      1 |run.ACC_ASYNC_KERNEL@li.32
|   3.9% | 0.000 |    -- |       -- |       -- |      1 |run.ACC_SYNC_WAIT@li.32
|=============================================================================

Program invocation:  CRAY.SANTIS 256

For a complete report with expanded tables and notes, run:
  pat_report /scratch/santis/piccinal/proposals.git/vihps/SAXPY_OPENACC/CRAY.SANTIS+24123-14t.ap2

For help identifying callers of particular functions:
  pat_report -O callers+src /scratch/santis/piccinal/proposals.git/vihps/SAXPY_OPENACC/CRAY.SANTIS+24123-14t.ap2
To see the entire call tree:
  pat_report -O calltree+src /scratch/santis/piccinal/proposals.git/vihps/SAXPY_OPENACC/CRAY.SANTIS+24123-14t.ap2

For interactive, graphical performance analysis, run:
  app2 /scratch/santis/piccinal/proposals.git/vihps/SAXPY_OPENACC/CRAY.SANTIS+24123-14t.ap2

================  End of CrayPat-lite output  ==========================

Comments (0)

  1. Log in to comment