_papi_hwi_native_name_to_code sensitivity with delay init components

Issue #115 new
john.rodgers created an issue

The PAPI function src/papi_internal.c::_papi_hwi_native_name_to_code has shown sensitivities when interacting with components that leverage the 'delayed initialization' scheme.

This sensitivity was discovered when configuring the infiniband (non-delay init) component after the rocm_smi (delay init) component. With this configuration, fully resolved infiniband event names (e.g. infiniband:::mlx5_0_1_ext:lifespan) could be correctly processed by _papi_hwi_native_name_to_code, but shorter forms without the component (e.g. mlx5_0_1_ext:lifespan) could not be processed.

Sample papi_command_line behavior showing issue:

./papi_command_line infiniband:::mlx5_0_1_ext:lifespan

This utility lets you add events from the command line interface to see if they work.

Successfully added: infiniband:::mlx5_0_1_ext:lifespan

infiniband:::mlx5_0_1_ext:lifespan :    0

----------------------------------
./papi_command_line mlx5_0_1_ext:lifespan

This utility lets you add events from the command line interface to see if they work.

Failed adding: mlx5_0_1_ext:lifespan
because: Not supported
No events specified!
Try running something like: ./papi_command_line PAPI_TOT_CYC

Triage findings:

  • When searching for the shorter form of the event, _papi_hwi_native_name_to_code will prematurally return after checking the rocm_smi component, prior to getting to the infiniband component.

    • Early return caused by _papi_hwd[cidx]->ntv_enum_events( &i, PAPI_ENUM_FIRST ) returning PAPI_ENOSUPP when checking for the event in rocm_smi.

      • Note: PAPI_ENOSUPP returned in this trial due to _rocm_smi_init_private failing at the _rocm_smi_linkRocmLibraries() stage (run on system without ROCm installed).
  • When searching for the longer form of the event, _papi_hwi_native_name_to_code avoids the error, as the check is_supported_by_component(cidx, full_event_name) prevents the search from reaching _papi_hwd[cidx]->ntv_enum_events

  • Checking the environment/system, as part of the standard component initialization stage, may help with preventing issue by disabling the components with something other than PAPI_EDELAY_INIT.

    • rocm does this in the call to rocp_init_environment
    • cuda does this in calls to _cuda_count_dev_{proc,sys}

Comments (0)

  1. Log in to comment