PAPI component infiniband is incredibly inefficient in collecting counter values
When configured in PAPI the infiniband component injects an incredible amount of overhead when reading values for the requested counter events.
Basically, the function read_ib_counter_value
creates the filepath for the counter event in which the value resides. The file is opened with opendir
and a pscanf
is done to retrieve the value. A closedir
is done before returning.
This is done for every individual event requested.
The most obvious way to improve the efficiency of reading event values is to perform the opendir
at the time the events are collected and save the FILE pointer in the infiniband_native_event_entry_t
data structure. This avoids having to recreate and open the filepath each time for the same event followed by a closedir
. Even the pscanf
might be able to be replaced with a faster read.
A plain open
could be done to get a file descriptor. Then a system call read
could acquire the value, reading the value into a char
buffer, followed by an atoi
to get the integer value.
I have not attempted these changes so there may be circumstances I am not aware of that might impede trying to improve this components efficiency.
There may be other solutions, but some consideration should be made to improve the efficiency of acquiring the count for the event.
If updating this component is not a high priority then please advise and we may take on the effort to reimplement, either as a customized component or an improved upstream component of which we would share the changes.
Comments (11)
-
reporter -
In addition, the following link provides descriptions for the extended event files in
hw_counters
directory. This could be used for better descriptions of the 64-bit events.https://support.mellanox.com/s/article/understanding-mlx5-linux-counters-and-status-parameters
-
@john.rodgers @Steve Kaufmann thank you for reporting this. We will look into it and let you know what we think.
-
- marked as enhancement
-
@john.rodgers @Steve Kaufmann after looking at the code I agree the easiest thing to do would be to move
opendir/closedir
toupdate_control_state
. (This should work fine as theupdate_control_state
is not in the critical path.) However, I am not sure whatopendir
is useful for in theread_ib_counter_value
function. If it was used as test it should fail with an error code, instead of doing nothing, if the path does not exist. Maybe it could be removed altogether (?). Let me know if I have missed anything here.Concerning
pscanf
, this could be split into afopen/flose
part, that would be moved tostart/stop
and avfscanf
, that would stay inpscanf
. -
Yeah, I guess the bottom line is:
o open each counter file once acquiring a file descriptor/pointer for the duration of execution
o consider reading/interpreting the read value with a non-formatted read, eg, read system call followed by atoi (the assumption being there’s a lot less non-relevant code executed compared to a scanf) - but the overhead may all just be in the noise
I think validating that the expected directory is prudent. If one knows at that point it does not exist why continue, and you disable reason message can convey something more accurate (the check should be done at init time) as oppose to saying a particular counter file “does not exist”. An
access
system call works just fine instead of anopendir
unless you are using aDIR
ptr to go through all the counter files.Thanks, Steve
-
Steve, this is an attempt patch: https://bitbucket.org/icl/papi/pull-requests/366/20221007-infiniband-perf-improv. I tested it and it seems to work. Can you review it and let me know if I have missed anything?
-
Thanks Giuseppe - I’ll take a look. I can also try to build your version and test it here if you want. Just make sure I understand what I should pull so I build with your changes. Is there some way I could just get the changed files and replace the ones in my local repository copy?
-
I also wanted to note the additional counter descriptions, et.al., that could be added to this component’s source (see above). It needn’t be included in this commit but it would be a nice enhancement for the next release of PAPI if there’s time, or at least in the first update to the next release (7.0.1.0?)
https://support.mellanox.com/s/article/understanding-mlx5-linux-counters-and-status-parameters
-
I’ve rewritten the component using system call
open
andpread
calls andstrtoul
to interpret the value which is read into a small buffer. Although it doesn’t run any faster (I was surprised) I think the implementation is simpler. Let me know if you’d be interested in acquiring what I’ve done, if anything as a reference. -
Sure that would be helpful can you point me to the branch?
- Log in to comment
Note: ticket filed on behalf of Steve Kaufman (steven.kaufmann@hpe.com)