summer school - openacc/cce (perftools)

Issue #36 new
jg piccinali repo owner created an issue

DAINT

Get the src

CCE

Compile

  • module load craype-accel-nvidia35
  • module load perftools
Currently Loaded Modulefiles:
  1) modules/3.2.10.3
  2) nodestat/2.2-1.0502.53712.3.109.ari
  3) sdb/1.0-1.0502.55976.5.27.ari
  4) alps/5.2.1-2.0502.9041.11.6.ari
  5) lustre-cray_ari_s/
2.5_3.0.101_0.31.1_1.0502.8394.10.1-1.0502.17198.8.51
  6) udreg/2.3.2-1.0502.9275.1.12.ari
  7) ugni/5.0-1.0502.9685.4.24.ari
  8) gni-headers/3.0-1.0502.9684.5.2.ari
  9) dmapp/7.0.1-1.0502.9501.5.219.ari
 10) xpmem/0.1-2.0502.55507.3.2.ari
 11) hss-llm/7.2.0
 12) Base-opts/1.0.2-1.0502.53325.1.2.ari
 13) craype-network-aries
 14) craype/2.4.0
 15) cce/8.3.12
 16) totalview-support/1.1.4
 17) totalview/8.11.0
 18) cray-libsci/13.0.4
 19) pmi/5.0.7-1.0000.10678.155.25.ari
 20) rca/1.0.0-2.0502.53711.3.127.ari
 21) atp/1.8.2
 22) PrgEnv-cray/5.2.40
 23) craype-sandybridge
 24) slurm
 25) cray-mpich/7.2.2
 26) ddt/5.0
 27) cray-libsci_acc/3.1.1
 28) cudatoolkit/6.5.14-1.0502.9613.6.1
 29) craype-accel-nvidia35
 30) perftools/6.2.4
  • make clean
  • make
ftn -rmd -hacc -O3 -e Z   -c stats.f90 -o stats.o
ftn -rmd -hacc -O3 -e Z   -c data.f90 -o data.o
ftn -rmd -hacc -O3 -e Z   -c operators.f90 -o operators.o
ftn -rmd -hacc -O3 -e Z   -c linalg.f90 -o linalg.o
ftn -rmd -hacc -O3 -e Z   -c io.f90 -o io.o
ftn -rmd -hacc -O3 -e Z  \
stats.o   data.o   operators.o     linalg.o     io.o \
main.f90  -o main

ftn -rmd -hacc -O3 -e Z   -c operators_mpi.f90 -DUSE_G2G \
-o operators_mpi.o 
ftn -rmd -hacc -O3 -e Z  \
stats.o   data.o   operators_mpi.o linalg.o     io.o \
main.f90  -o main_mpi
  • ls $CRAYPAT_ROOT/share/traces/
  • pat_build -g oacc main
INFO: A maximum of 43 functions from group 'oacc' will be traced.

Profile

  • sbatch.sh santis 1 main+pat 1 1 1 "512 512 50 0.0025"
aprun -n 1  main+pat 512 512 50 0.0025

CrayPat/X:  Version 6.2.4 
==============================
                       Welcome to mini-stencil!
 mesh ::  512 * 512     dx = 1.95694714784622192E-3
 time ::  50 time steps from 0 ..  2.50000000000000005E-3
=============================
-------------------------------------------------
 simulation took  2.543  seconds
 4676  conjugate gradient iterations 1838.49  per second
 246  nonlinear newton iterations
-----------------------------------------------
Experiment data file written:  main+pat+23524-12t.xf

Analyze

  • pat_report *xf >xf
Processing step 5 of 5

ef0.png ef1.png ef2.png

Comments (5)

  1. jg piccinali reporter

    perftools-lite

    • module load perftools-lite/6.2.4
    • export CRAYPAT_LITE=gpu
    • make clean; make main
    INFO: creating the CrayPat-instrumented executable 'main' (gpu) ...OK
    INFO: A maximum of 335 functions from group 'cuda' will be traced.
    
    • sbatch.sh santis 5 main+ptl624 1 1 1 "512 512 500 0.0025" eff0.png
  2. Log in to comment