PAPI component infiniband is incredibly inefficient in collecting counter values

Issue #118 new
john.rodgers created an issue

When configured in PAPI the infiniband component injects an incredible amount of overhead when reading values for the requested counter events.

Basically, the function read_ib_counter_value creates the filepath for the counter event in which the value resides. The file is opened with opendir and a pscanf is done to retrieve the value. A closedir is done before returning.

This is done for every​ individual event requested.

The most obvious way to improve the efficiency of reading event values is to perform the opendir at the time the events are collected and save the FILE pointer in the infiniband_native_event_entry_t data structure. This avoids having to recreate and open the filepath each time for the same event followed by a closedir. Even the pscanf might be able to be replaced with a faster read.

A plain open could be done to get a file descriptor. Then a system call read could acquire the value, reading the value into a char buffer, followed by an atoi to get the integer value. 

I have not attempted these changes so there may be circumstances I am not aware of that might impede trying to improve this components efficiency. 

There may be other solutions, but some consideration should be made to improve the efficiency of acquiring the count for the event.

If updating this component is not a high priority then please advise and we may take on the effort to reimplement, either as a customized component or an improved upstream component of which we would share the changes.

Comments (11)

  1. Giuseppe Congiu

    @john.rodgers @Steve Kaufmann thank you for reporting this. We will look into it and let you know what we think.

  2. Giuseppe Congiu

    @john.rodgers @Steve Kaufmann after looking at the code I agree the easiest thing to do would be to move opendir/closedir to update_control_state. (This should work fine as the update_control_state is not in the critical path.) However, I am not sure what opendir is useful for in the read_ib_counter_value function. If it was used as test it should fail with an error code, instead of doing nothing, if the path does not exist. Maybe it could be removed altogether (?). Let me know if I have missed anything here.

    Concerning pscanf, this could be split into a fopen/flose part, that would be moved to start/stop and a vfscanf, that would stay in pscanf.

  3. Steve Kaufmann

    Yeah, I guess the bottom line is:

    o open each counter file once acquiring a file descriptor/pointer for the duration of execution

    o consider reading/interpreting the read value with a non-formatted read, eg, read system call followed by atoi (the assumption being there’s a lot less non-relevant code executed compared to a scanf) - but the overhead may all just be in the noise

    I think validating that the expected directory is prudent. If one knows at that point it does not exist why continue, and you disable reason message can convey something more accurate (the check should be done at init time) as oppose to saying a particular counter file “does not exist”. An access system call works just fine instead of an opendir unless you are using a DIR ptr to go through all the counter files.

    Thanks, Steve

  4. Steve Kaufmann

    Thanks Giuseppe - I’ll take a look. I can also try to build your version and test it here if you want. Just make sure I understand what I should pull so I build with your changes. Is there some way I could just get the changed files and replace the ones in my local repository copy?

  5. Steve Kaufmann

    I’ve rewritten the component using system call open and pread calls and strtoul to interpret the value which is read into a small buffer. Although it doesn’t run any faster (I was surprised) I think the implementation is simpler. Let me know if you’d be interested in acquiring what I’ve done, if anything as a reference.

  6. Log in to comment