Wiki

Clone wiki

papi / PAPI-HL


High Level API

Note: The legacy high-level API (Application Programming Interface) (final release: 5.7.0) has been redesigned (first release: 6.0.0). Detailed information can be found in the White Paper.

The high-level API (Application Programming Interface) provides the ability to record performance events inside instrumented regions of serial, multi-processing (MPI, SHMEM) and thread (OpenMP, Pthreads) parallel applications. It is intended for users who want to perform simple event measurements in a very convenient way as they only have to mark code sections.

Events to be recorded are determined via an environment variable (PAPI_EVENTS) that lists comma separated events for any component (see example below). This enables users to perform different measurements without recompiling. In addition, users do not need to take care of printing performance events since an output is generated at the end of each measurement.

Some of the benefits of using the high-level API rather than the low-level API are that it is easier to use and requires less setup. For instance, the dynamic setting of performance events via the environment variable and the automatic detection of components makes the use of the high-level API extremely simple.

It should also be noted that the high-level API can be used in conjunction with the low-level API and, in fact, does call the low-level API.


High-Level Functions

The four functions of the high-level API are listed in the table below. They allow users to record and print specific performance events from both C and Fortran.

Function Name Description
PAPI_hl_region_begin (const char *region) Read performance events at the beginning of a region (the first call also starts counting the events)
PAPI_hl_read (const char *region) Read performance events inside of a region and store the difference to the corresponding beginning of the region
PAPI_hl_region_end (const char *region) Read performance events at the end of a region and store the difference to the corresponding beginning of the region
PAPI_hl_stop () Stop a running high-level event set (optional, only necessary if the programmer wants to use the low-level API in addition)

Hint: Click on a specific function to get a more detailed description.


Recording Performance Events

The following code example shows the use of the high-level API by marking a code section.

C:

#include "papi.h"

int main()
{
    int retval;

    retval = PAPI_hl_region_begin("computation");
    if ( retval != PAPI_OK )
        handle_error(1);

    /* Do some computation here */

    retval = PAPI_hl_region_end("computation");
    if ( retval != PAPI_OK )
        handle_error(1);
}

Fortran:

#include "fpapi.h"

      program main
      integer retval

      call PAPIf_hl_region_begin("computation", retval)
      if ( retval .NE. PAPI_OK ) then
         write (*,*) "PAPIf_hl_region_begin failed!"
      end if

      !do some computation here

      call PAPIf_hl_region_end("computation", retval)
      if ( retval .NE. PAPI_OK ) then
         write (*,*) "PAPIf_hl_region_end failed!"
      end if

      end program main
Note: To get a more detailed performance events evaluation PAPI_hl_read can be called several times inside of a region. However, the name argument must match the corresponding region name. It should also be noted, that a marked region is thread-local and therefore has to be in the same thread. If the programmer wants to mix high-level and low-level API calls, he must call PAPI_hl_stop() if low-level calls are used after a marked region.

Measurement Run:

If events are not specified via the environment variable PAPI_EVENTS, an output with default events is generated after the run. If supported by the respective machine the following default events are recorded:

  • perf::TASK-CLOCK
  • PAPI_TOT_INS
  • PAPI_TOT_CYC
  • PAPI_FP_INS (if not available PAPI tries to use PAPI_VEC_SP or PAPI_VEC_DP)
  • PAPI_FP_OPS (if not available PAPI tries to use PAPI_SP_OPS or PAPI_DP_OPS)

Note: Default events that are not available on the current machine, e.g. PAPI_FP_OPS, are automatically skipped. If PAPI_EVENTS is set, the default events are not recorded (unless they are added to PAPI_EVENTS). If some of the specified events cannot be interpreted, only the correct ones are taken for the measurement.

The output is generated in the current directory by default. However, it is recommended to specify an output directory for larger measurements, especially for MPI applications via environment variable PAPI_OUTPUT_DIRECTORY.

Example for setting performance events and output directory:

export PAPI_EVENTS="PAPI_TOT_INS,PAPI_TOT_CYC"
export PAPI_OUTPUT_DIRECTORY="scratch/measurement"

This will generate a directory called "papi_hl_output" in "scratch/measurement" that contains one or more output files in case of a MPI application.

Note: Performance events are stored as delta values, meaning the difference of the value from the end region call and the begin region call. Some events, like temperature or power, are specified as instantaneous values (see example below). In this case, only the value of the end region call is stored.

Example for setting instantaneous events:

export PAPI_EVENTS="coretemp:::hwmon0:temp2_input=instant"

Possible Output File:

Example of an output file for a serial application:

cat papi_hl_output/rank_720050.json
{
  "papi_version":"6.0.0.1",
  "cpu_info":"Intel(R) Xeon(R) CPU X7550 @ 2.00GHz",
  "max_cpu_rate_mhz":"1995",
  "min_cpu_rate_mhz":"1995",
  "event_definitions":{
    "perf::TASK-CLOCK":{
      "component":"perf_event",
      "type":"delta"
    },
    "PAPI_TOT_INS":{
      "component":"perf_event",
      "type":"delta"
    },
    "PAPI_TOT_CYC":{
      "component":"perf_event",
      "type":"delta"
    },
    "PAPI_FP_INS":{
      "component":"perf_event",
      "type":"delta"
    },
    "PAPI_FP_OPS":{
      "component":"perf_event",
      "type":"delta"
    }
  },
  "threads":{
    "0":{
      "regions":{
        "0":{
          "name":"computation",
          "parent_region_id":"-1",
          "cycles":"17729530032",
          "real_time_nsec":"8887417521",
          "perf::TASK-CLOCK":"8886388468",
          "PAPI_TOT_INS":"33007026164",
          "PAPI_TOT_CYC":"17624197693",
          "PAPI_FP_INS":"2003166805",
          "PAPI_FP_OPS":"2003166841"
        }
      }
    }
  }
}
Note: The output example above shows performance events for the region "computation" in JSON format. As it is a serial application there is only one thread containing performance events. In case of a thread parallel application there would be JSON objects for each thread. MPI applications would be saved in multiple files, one per MPI rank. In case more measurements are performed, the high-level library will not overwrite or delete old measurement directories. Instead, a timestamp is added to the old directory. For more convenience, the output can also be printed to stdout by setting PAPI_REPORT=1. This is not recommended for MPI applications as each MPI rank tries to print the output concurrently.

Enhanced Output:

The generated measurement output (see example above) can be converted in a better readable output. The python script papi_hl_output_writer.py enhances the output by creating some derived metrics, like IPC, MFlops/s, and MFlips/s as well as real and processor time in case the corresponding PAPI events have been recorded.

Example to generate an enhanced output:

papi_hl_output_writer.py --notation=derived --type=summary
{
    "computation": {
        "Region count": 1,
        "Real time in s": 7.62,
        "CPU time in s": 7.62,
        "IPC": 2.18,
        "MFLIPS/s": 263.0,
        "MFLOPS/s": 263.0
    }
}
Note: The output example above has been generated with the type option "summary" which summarizes performance events over all threads and MPI ranks in case of a parallel application. Use "papi_hl_output_writer.py --help" to see all available options.

Multiplexing Support:

The high-level API also supports multiplexing of cpu core events via the environment variable PAPI_MULTIPLEX.

Enable multiplexing support:

export PAPI_MULTIPLEX=1

Overview of Environment Variables

The following environment variables are only used by the high-level API:

Environment Variable Description Type
PAPI_EVENTS PAPI events to measure String
PAPI_MULTIPLEX Enable Multiplexing -
PAPI_REPORT Print output to stdout -
PAPI_OUTPUT_DIRECTORY Path of the measurement directory Path
PAPI_HL_VERBOSE Enables warnings and info -
PAPI_DEBUG=HIGHLEVEL Enable debugging of high-level routines String
PAPI_HL_THREAD_MULTIPLE Set to "0" to disable multi-thread monitoring String

Note: Environment variables without a type are enabled when they are set to any value. The value will not be interpreted. To disable those variables use the command "unset".

Updated