The application hwloc, built by the Runtime Inria team in Bordeaux, provides a Graphical User Interface to see the architecture of a computer. It shows the topology with the cache sizes, the number of logical and physical cores and so on. This GUI is nammed lstopo.
I, David Guyon, made an update for this tool during my internship in the ALF Inria team in Rennes. With this update, the original static lstopo window becomes dynamic. Every second, the interface is refreshed to display performance information such as the cache load and the ratio of cache misses over cache accesses in every cache box, the CPU load in the PU boxes and also the number of Instructions Per Cycle (IPC). It uses performance counters through system calls and values from the
/proc folder to get these values.
How to install
Before you start
Make sure you have the
autoreconf package (provided by
autoconf) and a C/C++ compiler like
gcc installed in your system.
The GUI uses the Cairo environment. You'll know if it's installed during the configuration.
You will also need the root permissions if you want to make a system-wide installation.
../configure(make sure to get yes at the line Graphical output)
In the root folder, run the
autoreconf command to regenerate the configure file and the Makefiles depending on your system.
Then make a build directory,
mkdir build, because we like doing clean compilation.
In this new directory, type
../configure. This is a bit verbose and at the end you will see the compatibility with your system. You can see in my output that it supports Cairo. If not, install it and do the configuration again.
Hwloc optional build support status (more details can be found above): Probe / display I/O devices: PCI(pciaccess+linux) Graphical output (Cairo): yes XML input / output: full libnuma memory support: yes Plugin support: no
Finally, run the
make command to do the compilation.
You can now run the application with
./utils/lstopo. See the How to use section for more information.
You probably want to install this application in your system to be able to run it with the
sudo make install.
You can now run the application with the
lstopo command. See the How to use section for more information.
How to use
There are two way to use this update : system-wide or per pid.
Using the system-wide performance analysis is, in my opinion, the easiest way to use the application. However, you will need to have the root permissions to run it.
sudo lstopo --no-io --no-legend
sudo ./utils/lstopo --no-io --no-legend
The option flags I put in the command line are implemented by the original lstopo. They are optionals but I like to use them because it removes the information we do not need for the analysis.
If you want to analyse a specific process or just because you do not have root permissions in your system, use the
--pid option. No need for
sudo when this option is set but it requires that the process pid exists and if it terminates, the application will close instantly.
lstopo --pid <pid>
./utils/lstopo --pid <pid>
To get the pid quickly:
./my-awesome-pgr & echo $!
Do not hesitate to use the brand new
--refresh option to control the refresh period. The default value is set to 1000ms and you can force this value to a minimum of 50ms.
The following screenshots show the analysis result for a program using OpenMP to handle parallelization. The left one is running with a number of 2 threads (one on each physical core) and the right one is running with 4 threads (one on each logical core). Note the higher IPC value (yellow) when there are only 2 logical cores. Indeed, running a program with 4 threads is not two times more efficient than 2 threads as we could expect.
The bellow two screenshots show another problem caused by denormal numbers. The program I used for the test is ramsurf. When you compile it with the
-funsafe-math-optimizations flag, the IPC value is high (around 2.5 Instructions Per Cycle with my machine). Without this flag, the compiled program runs with a very low IPC (around 0.05 Instructions Per Cycle). As you can see, it is not because of cache misses. The reasons of this behaviour are the floating point calculations.
This work in under a BSD licence (3-clause). The list of contributors is in
hwloc-1.9/COPYING. Here's its content:
Copyright © 2009 CNRS Copyright © 2009 inria. All rights reserved. Copyright © 2009 Université Bordeaux 1 Copyright © 2009 Cisco Systems, Inc. All rights reserved. Copyright © 2012 Blue Brain Project, EPFL. All rights reserved. Copyright © 2014 David Guyon. All rights reserved. See COPYING in top-level directory. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.