cache metrics

Issue #77 new
jg piccinali repo owner created an issue

regression.git/src/9211/C

  • module load perftools-cscs/630nogpu
  • make clean; make CC=cc

CRAY / DAINT / MPI+OPENMP executable ready

  • pat_help counters sandybridge groups
    There are 16 predefined hardware performance counter event groups
    that can be specified by setting PAT_RT_PERFCTR to the group id.
     0: D1 with instruction counts
     1: Summary -- FP and cache metrics
     2: D1, D2, L3 Metrics
  • export PAT_RT_PERFCTR=1
  • srun -n2 ./CRAY.exe
  • pat_report *.ap2 >xf
  • cat xf
Table 3:  Program HW Performance Counter Data
=====================================================================
  Total
---------------------------------------------------------
  DTLB_LOAD_MISSES:CAUSES_A_WALK                    61,681 
  DTLB_STORE_MISSES:CAUSES_A_WALK                   68,648 
  FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE            153,262,713 
  FP_COMP_OPS_EXE:X87                               305.50 
  L1D:REPLACEMENT                                8,834,990 
  L2_RQSTS:ALL_DEMAND_DATA_RD                    6,735,720 
  L2_RQSTS:ALL_DEMAND_RD_HIT                     5,918,472 
  MEM_UOPS_RETIRED:ALL_LOADS                   591,586,449 
  CPU_CLK_UNHALTED:THREAD_P                    708,708,347 
  CPU_CLK_UNHALTED:REF_P                        21,671,940 
  User time (approx)             0.234 secs    609,421,240 cycles
  CPU_CLK                         3.27GHz                  
  HW FP Ops / User time        654.124M/sec    153,263,018 ops  
                           3.1%peak(DP)
  Total DP ops                 654.124M/sec    153,263,018 ops
  Computational intensity         0.25 ops/cycle      0.26 ops/ref
  TLB utilization             4,539.18 refs/miss      8.87 avg uses
  D1 cache hit,miss ratios       98.5% hits           1.5% misses
  D1 cache utilization (misses)  66.96 refs/miss      8.37 avg hits
  D2 cache hit,miss ratio        90.7% hits           9.3% misses
  D1+D2 cache hit,miss ratio     99.9% hits           0.1% misses
  D1+D2 cache utilization       723.88 refs/miss     90.48 avg hits
  D2 to D1 bandwidth         1,754.635MiB/sec  431,086,112 bytes
=====================================================================

Comments (0)

  1. Log in to comment