- edited description
perfsuite (BlueWaters)
Issue #29
new
PizDora
Profile
aprun -n 8 -N 8 -d 1 -j 1 psrun -f -p bt-mz_C.8
- MPI programs, use the "-f" option (meaning "fork") for "psrun";
- OpenMP programs, use the "-p" option (meaning "pthread");
- Hybrid programs (MPI+OpenMP), use both "-f -p" options.
- F_INC=-g
bt-mz_C.8.0.4588.nid00034.xml
bt-mz_C.8.0.4589.nid00034.xml
bt-mz_C.8.0.4590.nid00034.xml
bt-mz_C.8.0.4591.nid00034.xml
bt-mz_C.8.0.4592.nid00034.xml
bt-mz_C.8.0.4593.nid00034.xml
bt-mz_C.8.0.4594.nid00034.xml
bt-mz_C.8.0.4595.nid00034.xml
Analyze
psprocess bt-mz_C.*.xml
Event Count Information
=======================================================
Index Description Counter Value
--------------------------------------------------------------------------------
1 Total cycles.............................................. 36,000,548,251
2 Instructions completed.................................... 71,217,539,311
Event Index
--------------------------------------------------------------------------------
1: PAPI_TOT_CYC 2: PAPI_TOT_INS
Statistics
======================================================
Counting domain................................................. user
Multiplexed..................................................... no
Graduated instructions per cycle................................ 1.978
MIPS (cycles)................................................... 5,145.389
MIPS (wall clock)............................................... 5,962.264
CPU time (seconds).............................................. 13.841
Wall clock time (seconds)....................................... 11.945
% CPU utilization............................................... 115.876
PizDaint
Setup
module swap PrgEnv-cray PrgEnv-gnu
Compile
cd /apps/daint/5.2.UP02/perfsuitebw/1.1.4/
cd CSCS/proposals.git/vihps/NPB3.3-MZ-MPI/
make bt-mz CLASS=C NPROCS=8 MAIN=bt \
FLINKFLAGS="-dynamic -O3 -fopenmp" \
F_INC=-g
Run
cd bin
cp ../BT-MZ/inputbt-mz.data.sample inputbt-mz.data
aprun -n8 -N8 -d1 -j1 bt-mz_C.8
- BT-MZ Benchmark Completed.
Profile
source /apps/daint/5.2.UP02/perfsuitebw/1.1.4/gnu_482/bin/psenv.sh
aprun -n 8 -N 8 -d 1 -j 1 psrun -f -p bt-mz_C.8
psprocess bt-mz_C.8.0.19793.nid00012.xml
PerfSuite Hardware Performance Summary Report
Version : 1.0
Created : Mon Jun 01 14:05:37 CEST 2015
Generator : psprocess Java version 0.1
XML Source : bt-mz_C.8.0.19793.nid00012.xml
Execution Information
================================================================================
Collector : libpshwpc
Date : Mon Jun 01 14:05:31 CEST 2015
Host : nid00012
Process ID : 19793
Thread : 0
User : piccinal
Command : bt-mz_C.8
Processor and System Information
================================================================================
Node CPUs : 16
Vendor : Intel
Brand : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
CPUID Info : family: 6, model: 45, stepping: 7
CPU Revision : 7
Clock (MHz) : 2601.000
Memory (MB) : 32220.29
Pagesize (KB) : 4
Cache Information
================================================================================
Cache levels : 3
--------------------------------
Level 1
Type : instruction
Size (KB) : 32
Linesize (B) : 64
Associativity : 8
Type : data
Size (KB) : 32
Linesize (B) : 64
Associativity : 8
--------------------------------
Level 2
Type : unified
Size (KB) : 256
Linesize (B) : 64
Associativity : 8
--------------------------------
Level 3
Type : unified
Size (KB) : 20480
Linesize (B) : 64
Associativity : 20
Event Count Information
================================================================================
Index Description Counter Value
--------------------------------------------------------------------------------
1 Conditional branch instructions........................... 225,129,721
2 Branch instructions....................................... 343,974,189
3 Conditional branch instructions mispredicted.............. 1,876,275
4 Conditional branch instructions not taken................. 84,619,228
5 Floating point divide instructions........................ 92,144,835
6 Floating point operations................................. 16,073,988,775
7 Level 1 data cache misses................................. 373,760,238
8 Level 1 instruction cache misses.......................... 806,819
9 Level 2 data cache accesses............................... 373,760,238
10 Level 2 instruction cache accesses........................ 973,617
11 Level 2 instruction cache misses.......................... 392,637
12 Level 2 cache misses...................................... 35,577,559
13 Level 3 data cache reads.................................. 27,565,820
14 Level 3 instruction cache accesses........................ 392,637
15 Level 3 total cache accesses.............................. 35,577,559
16 Level 3 cache misses...................................... 9,936,629
17 Level 3 total cache writes................................ 4,528,089
18 Load instructions......................................... 12,163,521,733
19 Store instructions........................................ 6,581,176,220
20 Cycles with no instruction issue.......................... 2,201,866,739
21 Instruction translation lookaside buffer misses........... 22,990
22 Total cycles.............................................. 16,531,314,158
23 Instructions completed.................................... 33,872,958,388
Event Index
--------------------------------------------------------------------------------
1: PAPI_BR_CN 2: PAPI_BR_INS 3: PAPI_BR_MSP 4: PAPI_BR_NTK
5: PAPI_FDV_INS 6: PAPI_FP_OPS 7: PAPI_L1_DCM 8: PAPI_L1_ICM
9: PAPI_L2_DCA 10: PAPI_L2_ICA 11: PAPI_L2_ICM 12: PAPI_L2_TCM
13: PAPI_L3_DCR 14: PAPI_L3_ICA 15: PAPI_L3_TCA 16: PAPI_L3_TCM
17: PAPI_L3_TCW 18: PAPI_LD_INS 19: PAPI_SR_INS 20: PAPI_STL_ICY
21: PAPI_TLB_IM 22: PAPI_TOT_CYC 23: PAPI_TOT_INS
Statistics
================================================================================
Counting domain................................................. user
Multiplexed..................................................... yes
Floating point operations per cycle............................. 0.972
Floating point operations per graduated instruction............. 0.475
Graduated instructions per cycle................................ 2.049
Graduated instructions per level 1 instruction cache miss....... 41,983.342
Percentage of cycles with no instruction issued................. 13.319
Graduated loads and stores per floating point operation......... 1.166
Level 2 cache miss ratio (data), data cache miss counts derived. 0.094
Level 2 cache miss ratio (instruction).......................... 0.403
Level 3 cache miss ratio........................................ 0.279
Bandwidth used to level 2 cache (MB/s).......................... 358.252
Bandwidth used to level 3 cache (MB/s).......................... 100.058
MFLOPS (cycles)................................................. 2,529.045
MFLOPS (wall clock)............................................. 2,886.490
MIPS (cycles)................................................... 5,329.496
MIPS (wall clock)............................................... 6,082.745
CPU time (seconds).............................................. 6.356
Wall clock time (seconds)....................................... 5.569
% CPU utilization............................................... 114.134
Cupti issue
- Error 101 for CUDA Driver API function 'cuCtxCreate'. cuptiQuery failed
- => must recompile papi without cuda...
Currently Loaded Modulefiles:
1) modules/3.2.10.3
2) nodestat/2.2-1.0502.53712.3.109.ari
3) sdb/1.0-1.0502.55976.5.27.ari
4) alps/5.2.1-2.0502.9041.11.6.ari
5) lustre-cray_ari_s/2.5_3.0.101_0.31.1_1.0502.8394.10.1-1.0502.17198.8.51
6) udreg/2.3.2-1.0502.9275.1.12.ari
7) ugni/5.0-1.0502.9685.4.24.ari
8) gni-headers/3.0-1.0502.9684.5.2.ari
9) dmapp/7.0.1-1.0502.9501.5.219.ari
10) xpmem/0.1-2.0502.55507.3.2.ari
11) hss-llm/7.2.0
12) Base-opts/1.0.2-1.0502.53325.1.2.ari
13) craype-network-aries
14) craype/2.3.0
15) craype-sandybridge
16) slurm
17) cray-mpich/7.2.0
18) ddt/5.0
19) gcc/4.8.2
20) totalview-support/1.1.4
21) totalview/8.11.0
22) cray-libsci/13.0.3
23) pmi/5.0.6-1.0000.10439.140.2.ari
24) atp/1.8.1
25) PrgEnv-gnu/5.2.40
Comments (10)
-
reporter -
reporter - edited description
-
reporter - edited description
-
reporter - edited description
-
reporter - edited description
-
reporter /apps/daint/5.2.UP02/perfsuitebw/1.1.4/gnu_482/share/perfsuite/xml/pshwpc/papi_sandybridge.xml
<ps_hwpc_eventlist class="PAPI"> Configuration file for Intel Sandy Bridge systems. $Id: papi_sandybridge.xml,v 1.1 2012/05/07 20:01:01 ruiliu Exp $ =================================================== --> <ps_hwpc_event type="preset" name="PAPI_BR_CN" /> <ps_hwpc_event type="preset" name="PAPI_BR_INS" /> <ps_hwpc_event type="preset" name="PAPI_BR_MSP" /> <ps_hwpc_event type="preset" name="PAPI_BR_NTK" /> <ps_hwpc_event type="preset" name="PAPI_FDV_INS" /> <ps_hwpc_event type="preset" name="PAPI_L1_DCM" /> <ps_hwpc_event type="preset" name="PAPI_L1_ICM" /> <ps_hwpc_event type="preset" name="PAPI_L2_DCA" /> <ps_hwpc_event type="preset" name="PAPI_L2_ICA" /> <ps_hwpc_event type="preset" name="PAPI_L2_ICM" /> <ps_hwpc_event type="preset" name="PAPI_L2_TCM" /> <ps_hwpc_event type="preset" name="PAPI_L3_DCR" /> <ps_hwpc_event type="preset" name="PAPI_L3_ICA" /> <ps_hwpc_event type="preset" name="PAPI_L3_TCA" /> <ps_hwpc_event type="preset" name="PAPI_L3_TCM" /> <ps_hwpc_event type="preset" name="PAPI_L3_TCW" /> <ps_hwpc_event type="preset" name="PAPI_LD_INS" /> <ps_hwpc_event type="preset" name="PAPI_SR_INS" /> <ps_hwpc_event type="preset" name="PAPI_STL_ICY" /> <ps_hwpc_event type="preset" name="PAPI_TLB_IM" /> <ps_hwpc_event type="preset" name="PAPI_TOT_CYC" /> <ps_hwpc_event type="preset" name="PAPI_TOT_INS" /> </ps_hwpc_eventlist>
-
reporter - edited description
-
reporter Instrumentation
Coding
ret = ps_hwpc_init(); if ( ret != PS_SUCCESS ) { fatal(ret, "Error in ps_hwpc_init"); } ret = ps_hwpc_start(); if ( ret != PS_SUCCESS ) { fatal(ret, "Error in ps_hwpc_start (1)"); } ret = ps_hwpc_suspend(); if ( ret != PS_SUCCESS ) { fatal(ret, "Error in ps_hwpc_suspend"); }
ret = ps_hwpc_stop(OUTPREFIX); if ( ret != PS_SUCCESS ) { fatal(ret, "Error in ps_hwpc_stop"); } ret = ps_hwpc_shutdown(); if ( ret != PS_SUCCESS ) { fatal(ret, "Error in ps_hwpc_shutdown"); }
Compile
- cd /apps/daint/5.2.UP02/perfsuitebw/1.1.4/gnu_482/share/perfsuite/examples/hl
- make clean
- make CC=cc CSCS="-dynamic -lexpat"
cc -c -g -O -I/apps/daint/5.2.UP02/perfsuitebw/1.1.4/gnu_482/include hl.c cc -o hl hl.o -L/apps/daint/5.2.UP02/perfsuitebw/1.1.4/gnu_482/lib \ -L/apps/daint/5.2.UP02/sandbox/jgp/papi/5.4.1/gnu_482/lib \ -lpshwpc -lperfsuite -lpapi -dynamic -lexpat
Run
- aprun -n1 ./a.out
Analyze
- psprocess hlout.11499.santis01.xml
PerfSuite Hardware Performance Summary Report Version : 1.0 Created : Mon Jun 01 14:43:55 CEST 2015 Generator : psprocess Java version 0.1 XML Source : hlout.11499.santis01.xml Execution Information ================================================================================ Collector : libpshwpc Date : Mon Jun 01 14:40:24 CEST 2015 Host : santis01 Process ID : 11499 Thread : 0 User : piccinal Command : hl Processor and System Information ================================================================================ Node CPUs : 16 Vendor : Intel Brand : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz CPUID Info : family: 6, model: 45, stepping: 7 CPU Revision : 7 Clock (MHz) : 2601.000 Memory (MB) : 32217.99 Pagesize (KB) : 4 Cache Information ================================================================================ Cache levels : 3 -------------------------------- Level 1 Type : instruction Size (KB) : 32 Linesize (B) : 64 Associativity : 8 Type : data Size (KB) : 32 Linesize (B) : 64 Associativity : 8 -------------------------------- Level 2 Type : unified Size (KB) : 256 Linesize (B) : 64 Associativity : 8 -------------------------------- Level 3 Type : unified Size (KB) : 20480 Linesize (B) : 64 Associativity : 20 Event Count Information ================================================================================ Index Description Counter Value -------------------------------------------------------------------------------- 1 Conditional branch instructions........................... 58,878,672 2 Branch instructions....................................... 59,196,806 3 Conditional branch instructions mispredicted.............. 257 4 Conditional branch instructions not taken................. 1,471 5 Floating point divide instructions........................ 0 6 Floating point operations................................. 139 7 Level 1 data cache misses................................. 516 8 Level 1 instruction cache misses.......................... 677 9 Level 2 data cache accesses............................... 516 10 Level 2 instruction cache accesses........................ 72 11 Level 2 instruction cache misses.......................... 6 12 Level 2 cache misses...................................... -24 13 Level 3 data cache reads.................................. 167 14 Level 3 instruction cache accesses........................ 6 15 Level 3 total cache accesses.............................. -24 16 Level 3 cache misses...................................... 79 17 Level 3 total cache writes................................ 8 18 Load instructions......................................... 160,395,802 19 Store instructions........................................ 0 20 Cycles with no instruction issue.......................... 0 21 Instruction translation lookaside buffer misses........... 0 22 Total cycles.............................................. 118,530,081 23 Instructions completed.................................... 296,061,934 Event Index -------------------------------------------------------------------------------- 1: PAPI_BR_CN 2: PAPI_BR_INS 3: PAPI_BR_MSP 4: PAPI_BR_NTK 5: PAPI_FDV_INS 6: PAPI_FP_OPS 7: PAPI_L1_DCM 8: PAPI_L1_ICM 9: PAPI_L2_DCA 10: PAPI_L2_ICA 11: PAPI_L2_ICM 12: PAPI_L2_TCM 13: PAPI_L3_DCR 14: PAPI_L3_ICA 15: PAPI_L3_TCA 16: PAPI_L3_TCM 17: PAPI_L3_TCW 18: PAPI_LD_INS 19: PAPI_SR_INS 20: PAPI_STL_ICY 21: PAPI_TLB_IM 22: PAPI_TOT_CYC 23: PAPI_TOT_INS Statistics ================================================================================ Counting domain................................................. user Multiplexed..................................................... yes Floating point operations per cycle............................. 0.000 Floating point operations per graduated instruction............. 0.000 Graduated instructions per cycle................................ 2.498 Graduated instructions per level 1 instruction cache miss....... 437,314.526 Percentage of cycles with no instruction issued................. 0.000 Graduated loads and stores per floating point operation......... 1,153,926.633 Level 2 cache miss ratio (data), data cache miss counts derived. -0.058 Level 2 cache miss ratio (instruction).......................... 0.083 Level 3 cache miss ratio........................................ -3.292 Bandwidth used to level 2 cache (MB/s).......................... -0.034 Bandwidth used to level 3 cache (MB/s).......................... 0.111 MFLOPS (cycles)................................................. 0.003 MFLOPS (wall clock)............................................. 0.003 MIPS (cycles)................................................... 6,496.723 MIPS (wall clock)............................................... 6,806.692 CPU time (seconds).............................................. 0.046 Wall clock time (seconds)....................................... 0.043 % CPU utilization............................................... 104.771
-
reporter MFLOPS not available on Intel Haswell:
cray-perftools: The document that specifies performance monitoring events for Intel processors does not include events that could be used to compute a count of floating point operations for Haswell processors: Intel 64 and IA-32 Architectures Software Developer's Manual, Order Number 253665-050US, February 2014.
-
reporter dgemm
intel+openblas
Compile
- module swap PrgEnv-cray PrgEnv-intel
- make dgemm-naive
- cc -c -dynamic -O2 -g0 -mavx -fopenmp dgemm-naive.c
- cc -o dgemm-naive dgemm.o dgemm-naive.o -dynamic -O2 -g0 -mavx -fopenmp -L/users/fgilles/Projects/OpenBlas/libopenblas.a
Run
- source /apps/daint/5.2.UP02/perfsuitebw/1.1.4/int_1501/bin/psenv.sh
- aprun -n1 psrun ./dgemm-naive
Size: 512 512 512 Gflop/s: 4.64157 blas Gflops: 16.4371
Analyze
- psprocess dgemm-naive.23265.nid00012.xml
PerfSuite Hardware Performance Summary Report Version : 1.0 Created : Mon Jun 01 16:35:09 CEST 2015 Generator : psprocess Java version 0.1 XML Source : dgemm-naive.23265.nid00012.xml Execution Information ================================================================================ Collector : libpshwpc Date : Mon Jun 01 16:34:51 CEST 2015 Host : nid00012 Process ID : 23265 Thread : 0 User : piccinal Command : dgemm-naive Processor and System Information ================================================================================ Node CPUs : 16 Vendor : Intel Brand : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz CPUID Info : family: 6, model: 45, stepping: 7 CPU Revision : 7 Clock (MHz) : 2601.000 Memory (MB) : 32220.29 Pagesize (KB) : 4 Cache Information ================================================================================ Cache levels : 3 -------------------------------- Level 1 Type : instruction Size (KB) : 32 Linesize (B) : 64 Associativity : 8 Type : data Size (KB) : 32 Linesize (B) : 64 Associativity : 8 -------------------------------- Level 2 Type : unified Size (KB) : 256 Linesize (B) : 64 Associativity : 8 -------------------------------- Level 3 Type : unified Size (KB) : 20480 Linesize (B) : 64 Associativity : 20 Event Count Information ================================================================================ Index Description Counter Value -------------------------------------------------------------------------------- 1 Conditional branch instructions........................... 11,144,896 2 Branch instructions....................................... 14,867,440 3 Conditional branch instructions mispredicted.............. 14,728 4 Conditional branch instructions not taken................. 2,261,883 5 Floating point divide instructions........................ 86 6 Floating point operations................................. 0 7 Level 1 data cache misses................................. 23,463,527 8 Level 1 instruction cache misses.......................... 90 9 Level 2 data cache accesses............................... 23,463,527 10 Level 2 instruction cache accesses........................ 111 11 Level 2 instruction cache misses.......................... 75 12 Level 2 cache misses...................................... 12,186,803 13 Level 3 data cache reads.................................. 9,817,206 14 Level 3 instruction cache accesses........................ 75 15 Level 3 total cache accesses.............................. 12,186,803 16 Level 3 cache misses...................................... 10 17 Level 3 total cache writes................................ 66 18 Load instructions......................................... 116,334,341 19 Store instructions........................................ 27,544,422 20 Cycles with no instruction issue.......................... 517,295 21 Instruction translation lookaside buffer misses........... 4,415 22 Total cycles.............................................. 187,479,084 23 Instructions completed.................................... 457,393,487 Event Index -------------------------------------------------------------------------------- 1: PAPI_BR_CN 2: PAPI_BR_INS 3: PAPI_BR_MSP 4: PAPI_BR_NTK 5: PAPI_FDV_INS 6: PAPI_FP_OPS 7: PAPI_L1_DCM 8: PAPI_L1_ICM 9: PAPI_L2_DCA 10: PAPI_L2_ICA 11: PAPI_L2_ICM 12: PAPI_L2_TCM 13: PAPI_L3_DCR 14: PAPI_L3_ICA 15: PAPI_L3_TCA 16: PAPI_L3_TCM 17: PAPI_L3_TCW 18: PAPI_LD_INS 19: PAPI_SR_INS 20: PAPI_STL_ICY 21: PAPI_TLB_IM 22: PAPI_TOT_CYC 23: PAPI_TOT_INS Statistics ================================================================================ Counting domain................................................. user Multiplexed..................................................... yes Floating point operations per cycle............................. 0.000 Floating point operations per graduated instruction............. 0.000 Graduated instructions per cycle................................ 2.440 Graduated instructions per level 1 instruction cache miss....... 5,082,149.856 Percentage of cycles with no instruction issued................. 0.276 Level 2 cache miss ratio (data), data cache miss counts derived. 0.519 Level 2 cache miss ratio (instruction).......................... 0.676 Level 3 cache miss ratio........................................ 0.000 Bandwidth used to level 2 cache (MB/s).......................... 10,820.748 Bandwidth used to level 3 cache (MB/s).......................... 0.009 MFLOPS (cycles)................................................. 0.000 MFLOPS (wall clock)............................................. 0.000 MIPS (cycles)................................................... 6,345.670 MIPS (wall clock)............................................... 5,386.716 CPU time (seconds).............................................. 0.072 Wall clock time (seconds)....................................... 0.085 % CPU utilization............................................... 84.888
intel+mkl
Run
- aprun -n1 psrun ./dgemm-naive
Size: 512 512 512 Gflop/s: 4.61794 blas Gflops: 9.23317
Analyze
- psprocess dgemm-naive.12019.nid00013.xml
PerfSuite Hardware Performance Summary Report Version : 1.0 Created : Mon Jun 01 16:40:51 CEST 2015 Generator : psprocess Java version 0.1 XML Source : dgemm-naive.12019.nid00013.xml Execution Information ================================================================================ Collector : libpshwpc Date : Mon Jun 01 16:40:31 CEST 2015 Host : nid00013 Process ID : 12019 Thread : 0 User : piccinal Command : dgemm-naive Processor and System Information ================================================================================ Node CPUs : 16 Vendor : Intel Brand : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz CPUID Info : family: 6, model: 45, stepping: 7 CPU Revision : 7 Clock (MHz) : 2601.000 Memory (MB) : 32220.29 Pagesize (KB) : 4 Cache Information ================================================================================ Cache levels : 3 -------------------------------- Level 1 Type : instruction Size (KB) : 32 Linesize (B) : 64 Associativity : 8 Type : data Size (KB) : 32 Linesize (B) : 64 Associativity : 8 -------------------------------- Level 2 Type : unified Size (KB) : 256 Linesize (B) : 64 Associativity : 8 -------------------------------- Level 3 Type : unified Size (KB) : 20480 Linesize (B) : 64 Associativity : 20 Event Count Information ================================================================================ Index Description Counter Value -------------------------------------------------------------------------------- 1 Conditional branch instructions........................... 10,631,132 2 Branch instructions....................................... 15,031,701 3 Conditional branch instructions mispredicted.............. 15,715 4 Conditional branch instructions not taken................. 2,146,559 5 Floating point divide instructions........................ 156 6 Floating point operations................................. 0 7 Level 1 data cache misses................................. 24,995,469 8 Level 1 instruction cache misses.......................... 172 9 Level 2 data cache accesses............................... 24,995,469 10 Level 2 instruction cache accesses........................ 252 11 Level 2 instruction cache misses.......................... 166 12 Level 2 cache misses...................................... 12,983,919 13 Level 3 data cache reads.................................. 10,410,725 14 Level 3 instruction cache accesses........................ 166 15 Level 3 total cache accesses.............................. 12,983,919 16 Level 3 cache misses...................................... 5 17 Level 3 total cache writes................................ 102 18 Load instructions......................................... 108,701,545 19 Store instructions........................................ 26,668,418 20 Cycles with no instruction issue.......................... 1,003,490 21 Instruction translation lookaside buffer misses........... 4,975 22 Total cycles.............................................. 178,273,106 23 Instructions completed.................................... 409,256,479 Event Index -------------------------------------------------------------------------------- 1: PAPI_BR_CN 2: PAPI_BR_INS 3: PAPI_BR_MSP 4: PAPI_BR_NTK 5: PAPI_FDV_INS 6: PAPI_FP_OPS 7: PAPI_L1_DCM 8: PAPI_L1_ICM 9: PAPI_L2_DCA 10: PAPI_L2_ICA 11: PAPI_L2_ICM 12: PAPI_L2_TCM 13: PAPI_L3_DCR 14: PAPI_L3_ICA 15: PAPI_L3_TCA 16: PAPI_L3_TCM 17: PAPI_L3_TCW 18: PAPI_LD_INS 19: PAPI_SR_INS 20: PAPI_STL_ICY 21: PAPI_TLB_IM 22: PAPI_TOT_CYC 23: PAPI_TOT_INS Statistics ================================================================================ Counting domain................................................. user Multiplexed..................................................... yes Floating point operations per cycle............................. 0.000 Floating point operations per graduated instruction............. 0.000 Graduated instructions per cycle................................ 2.296 Graduated instructions per level 1 instruction cache miss....... 2,379,398.134 Percentage of cycles with no instruction issued................. 0.563 Level 2 cache miss ratio (data), data cache miss counts derived. 0.519 Level 2 cache miss ratio (instruction).......................... 0.659 Level 3 cache miss ratio........................................ 0.000 Bandwidth used to level 2 cache (MB/s).......................... 12,123.843 Bandwidth used to level 3 cache (MB/s).......................... 0.005 MFLOPS (cycles)................................................. 0.000 MFLOPS (wall clock)............................................. 0.000 MIPS (cycles)................................................... 5,971.041 MIPS (wall clock)............................................... 4,160.875 CPU time (seconds).............................................. 0.069 Wall clock time (seconds)....................................... 0.098 % CPU utilization............................................... 69.684
- Log in to comment