Wiki
Clone wikipapi / PAPI-HL
High Level API
Note: The legacy high-level API (Application Programming Interface) (final release: 5.7.0) has been redesigned (first release: 6.0.0). Detailed information can be found in the White Paper.
The high-level API (Application Programming Interface) provides the ability to record performance events inside instrumented regions of serial, multi-processing (MPI, SHMEM) and thread (OpenMP, Pthreads) parallel applications. It is intended for users who want to perform simple event measurements in a very convenient way as they only have to mark code sections.
Events to be recorded are determined via an environment variable (PAPI_EVENTS) that lists comma separated events for any component (see example below). This enables users to perform different measurements without recompiling. In addition, users do not need to take care of printing performance events since an output is generated at the end of each measurement.
Some of the benefits of using the high-level API rather than the low-level API are that it is easier to use and requires less setup. For instance, the dynamic setting of performance events via the environment variable and the automatic detection of components makes the use of the high-level API extremely simple.
It should also be noted that the high-level API can be used in conjunction with the low-level API and, in fact, does call the low-level API.
High-Level Functions
The four functions of the high-level API are listed in the table below. They allow users to record and print specific performance events from both C and Fortran.
Function Name | Description |
---|---|
PAPI_hl_region_begin (const char *region) | Read performance events at the beginning of a region (the first call also starts counting the events) |
PAPI_hl_read (const char *region) | Read performance events inside of a region and store the difference to the corresponding beginning of the region |
PAPI_hl_region_end (const char *region) | Read performance events at the end of a region and store the difference to the corresponding beginning of the region |
PAPI_hl_stop () | Stop a running high-level event set (optional, only necessary if the programmer wants to use the low-level API in addition) |
Hint: Click on a specific function to get a more detailed description.
Recording Performance Events
The following code example shows the use of the high-level API by marking a code section.
C:
#include "papi.h" int main() { int retval; retval = PAPI_hl_region_begin("computation"); if ( retval != PAPI_OK ) handle_error(1); /* Do some computation here */ retval = PAPI_hl_region_end("computation"); if ( retval != PAPI_OK ) handle_error(1); }
Fortran:
#include "fpapi.h" program main integer retval call PAPIf_hl_region_begin("computation", retval) if ( retval .NE. PAPI_OK ) then write (*,*) "PAPIf_hl_region_begin failed!" end if !do some computation here call PAPIf_hl_region_end("computation", retval) if ( retval .NE. PAPI_OK ) then write (*,*) "PAPIf_hl_region_end failed!" end if end program main
Measurement Run:
If events are not specified via the environment variable PAPI_EVENTS, an output with default events is generated after the run. If supported by the respective machine the following default events are recorded:
- perf::TASK-CLOCK
- PAPI_TOT_INS
- PAPI_TOT_CYC
- PAPI_FP_INS (if not available PAPI tries to use PAPI_VEC_SP or PAPI_VEC_DP)
- PAPI_FP_OPS (if not available PAPI tries to use PAPI_SP_OPS or PAPI_DP_OPS)
Note: Default events that are not available on the current machine, e.g. PAPI_FP_OPS, are automatically skipped. If PAPI_EVENTS is set, the default events are not recorded (unless they are added to PAPI_EVENTS). If some of the specified events cannot be interpreted, only the correct ones are taken for the measurement.
The output is generated in the current directory by default. However, it is recommended to specify an output directory for larger measurements, especially for MPI applications via environment variable PAPI_OUTPUT_DIRECTORY.
Example for setting performance events and output directory:
export PAPI_EVENTS="PAPI_TOT_INS,PAPI_TOT_CYC" export PAPI_OUTPUT_DIRECTORY="scratch/measurement"
This will generate a directory called "papi_hl_output" in "scratch/measurement" that contains one or more output files in case of a MPI application.
Note: Performance events are stored as delta values, meaning the difference of the value from the end region call and the begin region call. Some events, like temperature or power, are specified as instantaneous values (see example below). In this case, only the value of the end region call is stored.
Example for setting instantaneous events:
export PAPI_EVENTS="coretemp:::hwmon0:temp2_input=instant"
Possible Output File:
Example of an output file for a serial application:
cat papi_hl_output/rank_720050.json { "papi_version":"6.0.0.1", "cpu_info":"Intel(R) Xeon(R) CPU X7550 @ 2.00GHz", "max_cpu_rate_mhz":"1995", "min_cpu_rate_mhz":"1995", "event_definitions":{ "perf::TASK-CLOCK":{ "component":"perf_event", "type":"delta" }, "PAPI_TOT_INS":{ "component":"perf_event", "type":"delta" }, "PAPI_TOT_CYC":{ "component":"perf_event", "type":"delta" }, "PAPI_FP_INS":{ "component":"perf_event", "type":"delta" }, "PAPI_FP_OPS":{ "component":"perf_event", "type":"delta" } }, "threads":{ "0":{ "regions":{ "0":{ "name":"computation", "parent_region_id":"-1", "cycles":"17729530032", "real_time_nsec":"8887417521", "perf::TASK-CLOCK":"8886388468", "PAPI_TOT_INS":"33007026164", "PAPI_TOT_CYC":"17624197693", "PAPI_FP_INS":"2003166805", "PAPI_FP_OPS":"2003166841" } } } } }
Enhanced Output:
The generated measurement output (see example above) can be converted in a better readable output. The python script papi_hl_output_writer.py enhances the output by creating some derived metrics, like IPC, MFlops/s, and MFlips/s as well as real and processor time in case the corresponding PAPI events have been recorded.
Example to generate an enhanced output:
papi_hl_output_writer.py --notation=derived --type=summary { "computation": { "Region count": 1, "Real time in s": 7.62, "CPU time in s": 7.62, "IPC": 2.18, "MFLIPS/s": 263.0, "MFLOPS/s": 263.0 } }
Multiplexing Support:
The high-level API also supports multiplexing of cpu core events via the environment variable PAPI_MULTIPLEX.
Enable multiplexing support:
export PAPI_MULTIPLEX=1
Overview of Environment Variables
The following environment variables are only used by the high-level API:
Environment Variable | Description | Type |
---|---|---|
PAPI_EVENTS | PAPI events to measure | String |
PAPI_MULTIPLEX | Enable Multiplexing | - |
PAPI_REPORT | Print output to stdout | - |
PAPI_OUTPUT_DIRECTORY | Path of the measurement directory | Path |
PAPI_HL_VERBOSE | Enables warnings and info | - |
PAPI_DEBUG=HIGHLEVEL | Enable debugging of high-level routines | String |
PAPI_HL_THREAD_MULTIPLE | Set to "0" to disable multi-thread monitoring | String |
Note: Environment variables without a type are enabled when they are set to any value. The value will not be interpreted. To disable those variables use the command "unset".
Updated