- edited description
OpenACC / saxpy
Issue #2
new
Get the src
ssh -Y daint01
git clone --single-branch -b openacc.saxpy https://github.com/eth-cscs/parallel-debuggers.git
cd parallel-debuggers/openacc.saxpy.git/
Compile
PGI
module swap PrgEnv-cray PrgEnv-pgi
module swap craype/2.05 craype/2.2.0
module swap pgi /apps/daint/scorep/mf/pgi/1470
module swap cray-mpich/6.2.2 cray-mpich/7.0.3
module load craype-accel-nvidia35
module rm libsci_acc
See below
CCE
* module load PrgEnv-cray # cce/8.2.3
* module load perftools-lite # 6.2.0.12614
* module load craype-accel-nvidia35 # cudatoolkit/5.5.20-1.0402.7700.8.1
* make clean
* make PERFFLAGS=-O3
Run (sample_profile)
- salloc -N1
- aprun -n1 ./CRAY.TODI 12
CrayPat/X: Version 6.2.0.12614 Revision 12614 04/14/14 17:11:54
pat[WARNING][0]:
Collection of accelerator performance data
for sampling experiments is not supported.
To collect accelerator performance data perform a trace experiment.
See the intro_craypat(1) man page on how to perform a trace experiment.
Run (gpu)
- export CRAYPAT_LITE=gpu
- make clean
- make PERFFLAGS=-O3
- aprun -n1 ./CRAY.TODI 12
CrayPat/X: Version 6.2.0.12614 Revision 12614 04/14/14 17:11:54
using MPI with 1 PEs, N=12
_OPENACC version:201306
c[0]=0
c[1]=101
c[N/2]=606
c[N-1]=1111
#################################################################
# #
# CrayPat-lite Performance Statistics #
# #
#################################################################
CrayPat/X: Version 6.2.0.12614 Revision 12614 (xf 12504) 04/14/14 17:11:54
Experiment: lite lite/gpu
Number of PEs (MPI ranks): 1
Numbers of PEs per Node: 1
Numbers of Threads per PE: 1
Number of Cores per Socket: 16
Execution start time: Mon May 19 16:31:35 2014
System name and speed: todi4 2100 MHz
Wall Clock Time: 0.077983 secs
High Memory: 39.04 MBytes
Table 1: Accelerator Table by Function (top 10 functions shown)
Host | Host | Acc | Acc Copy | Acc Copy | Events |Function=[max10]
Time% | Time | Time | In | Out | | PE=HIDE
| | | (MBytes) | (MBytes) | | Thread=HIDE
100.0% | 0.000 | 0.000 | 0.000 | 0.000 | 5 |Total
|------------------------------------------------------------------------------------------------------------------
| 46.7% | 0.000 | 0.000 | 0.000 | -- | 1 |saxpy(int, double, double*, double*).ACC_COPY@li.69
| 25.8% | 0.000 | 0.000 | -- | -- | 1 |saxpy(int, double, double*, double*).ACC_ASYNC_KERNEL@li.69
| 17.6% | 0.000 | 0.000 | -- | 0.000 | 1 |saxpy(int, double, double*, double*).ACC_COPY@li.70
| 8.5% | 0.000 | -- | -- | -- | 1 |saxpy(int, double, double*, double*).ACC_SYNC_WAIT@li.70
| 1.4% | 0.000 | -- | -- | -- | 1 |saxpy(int, double, double*, double*).ACC_REGION@li.69
|==================================================================================================================
Program invocation: ./CRAY.TODI 12
For a complete report with expanded tables and notes, run:
pat_report /users/piccinal/pug.git/src/openacc.saxpy.git/CRAY.TODI+11588-3t.ap2
For help identifying callers of particular functions:
pat_report -O callers+src /users/piccinal/pug.git/src/openacc.saxpy.git/CRAY.TODI+11588-3t.ap2
To see the entire call tree:
pat_report -O calltree+src /users/piccinal/pug.git/src/openacc.saxpy.git/CRAY.TODI+11588-3t.ap2
For interactive, graphical performance analysis, run:
app2 /users/piccinal/pug.git/src/openacc.saxpy.git/CRAY.TODI+11588-3t.ap2
================ End of CrayPat-lite output ==========================
Comments (11)
-
reporter -
reporter - edited description
-
reporter - edited description
-
reporter - edited description
-
reporter - edited description
-
reporter - edited description
-
reporter PGI/14.7
Setup
module swap PrgEnv-cray PrgEnv-pgi module swap craype/2.05 craype/2.2.0 module swap pgi /apps/daint/scorep/mf/pgi/1470 module swap cray-mpich/6.2.2 cray-mpich/7.0.3 module load cudatoolkit/5.5.20-1.0501.7945.8.2 module load scorep/1.3 module list
Compile
Fortran (ok)
make OBJ=mpiacc_f.o CC="scorep --mpp=mpi --cuda ftn"
C (ok)
make OBJ=mpiacc_c.o CC="scorep --mpp=mpi --cuda cc"
C++ (issue)
make OBJ=mpiacc_cxx.o CC="scorep --mpp=mpi --cuda CC"
- scorep --mpp=mpi --cuda CC -g -acc -ta=nvidia:cc35 -mcmodel=medium -c mpiacc_c.cpp -o PGI_mpiacc_c.o
- scorep --mpp=mpi --cuda CC -g -acc -ta=nvidia:cc35 -mcmodel=medium PGI_mpiacc_c.o -o PGI.DAINT
using MPI with 1 PEs, N=12 _OPENACC version:201111 c[0]=5.02621e+180 c[1]=1.78826e+161 c[N/2]=1.07296e+162 c[N-1]=1.96709e+162 [Score-P] src/measurement/SCOREP_RuntimeManagement.c:566: Warning: If you are using MPICH1, please ignore this warning. If not, it seems that your interprocess communication library (e.g., MPI) hasn't been initialized. Score-P can't generate output. Application 2780523 resources: utime ~0s, stime ~0s, Rss ~157008, inblocks ~2920, outblocks ~7235
Run
export SCOREP_ENABLE_PROFILING=false export SCOREP_ENABLE_TRACING=true export SCOREP_CUDA_ENABLE=yes,flushatexit export SCOREP_TOTAL_MEMORY=1G aprun -n1 -N1 -d1 PGI.DAINT 12
-
reporter - changed title to OpenACC / saxpy
-
reporter - edited description
-
reporter - edited description
-
reporter - edited description
- Log in to comment