cache metrics
Issue #77
new
regression.git/src/9211/C
- module load perftools-cscs/630nogpu
- make clean; make CC=cc
CRAY / DAINT / MPI+OPENMP executable ready
- pat_help counters sandybridge groups
There are 16 predefined hardware performance counter event groups
that can be specified by setting PAT_RT_PERFCTR to the group id.
0: D1 with instruction counts
1: Summary -- FP and cache metrics
2: D1, D2, L3 Metrics
- export PAT_RT_PERFCTR=1
- srun -n2 ./CRAY.exe
- pat_report *.ap2 >xf
- cat xf
Table 3: Program HW Performance Counter Data
=====================================================================
Total
---------------------------------------------------------
DTLB_LOAD_MISSES:CAUSES_A_WALK 61,681
DTLB_STORE_MISSES:CAUSES_A_WALK 68,648
FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE 153,262,713
FP_COMP_OPS_EXE:X87 305.50
L1D:REPLACEMENT 8,834,990
L2_RQSTS:ALL_DEMAND_DATA_RD 6,735,720
L2_RQSTS:ALL_DEMAND_RD_HIT 5,918,472
MEM_UOPS_RETIRED:ALL_LOADS 591,586,449
CPU_CLK_UNHALTED:THREAD_P 708,708,347
CPU_CLK_UNHALTED:REF_P 21,671,940
User time (approx) 0.234 secs 609,421,240 cycles
CPU_CLK 3.27GHz
HW FP Ops / User time 654.124M/sec 153,263,018 ops
3.1%peak(DP)
Total DP ops 654.124M/sec 153,263,018 ops
Computational intensity 0.25 ops/cycle 0.26 ops/ref
TLB utilization 4,539.18 refs/miss 8.87 avg uses
D1 cache hit,miss ratios 98.5% hits 1.5% misses
D1 cache utilization (misses) 66.96 refs/miss 8.37 avg hits
D2 cache hit,miss ratio 90.7% hits 9.3% misses
D1+D2 cache hit,miss ratio 99.9% hits 0.1% misses
D1+D2 cache utilization 723.88 refs/miss 90.48 avg hits
D2 to D1 bandwidth 1,754.635MiB/sec 431,086,112 bytes
=====================================================================