CUDA component restart w/ new event failure

Issue #101 on hold
john.rodgers created an issue

Using the latest version of PAPI, encountering a failure in the _cuda11_build_profiling_structures function for the following workflow:

  1. Create Eventset 1
  2. PAPI Start
  3. PAPI Stop
  4. Cleanup/Destroy Eventset 1
  5. Create Eventset 2
  6. PAPI Start
  7. PAPI Stop

Failure appears to be occurring in step 6, with _cuda11_start failing at _cuda11_build_profiling_structures because the buffer for cuda11_CounterAvailabilityImage has already been freed in step 4 (_cuda11_cleanup_eventset).

In steps 5 & 6, two main factors prevent the component from being properly re-configured:

  1. Component maintains an initialized state.
  2. The _cuda_vector is in a state where it can’t re-initialize the component. (Lazy initialization mechanics no-longer exposed due to _cuda11_cuda_vector having already been called)

Preliminary tests show that 'un-initializing' the component during the _cuda11_cleanup_eventset function call might fix the issue.

Comments (4)

  1. john.rodgers reporter

    Hi Giuseppe, confirming issue is still present using the latest version of the cuda component.

  2. Giuseppe Congiu

    Hi John, thanks for confirming. This looks like a design bug in the component. The way I see it, when the component is initialized it should only maintain a minimal state, that is needed across multiple EventSets measurements (e.g. events table). Everything else should be EventSet owned (through the control state) and only live for the lifespan of the EventSet itself (i.e. between PAPI_start and PAPI_stop). Un-initializing the component at _cuda11_cleanup_eventset might work but sounds “hacky”.

  3. Giuseppe Congiu

    This problem requires a substantial restructuring of the cuda component and is not addressable easily. There is currently an effort to rewrite the component and remove all the intrinsic limitations of the old one (including the one pointed out by this ticket)

  4. Log in to comment