HTTPS SSH

Dynamic lstopo

Introduction

The application hwloc, built by the Runtime Inria team in Bordeaux, provides a Graphical User Interface to see the architecture of a computer. It shows the topology with the cache sizes, the number of logical and physical cores and so on. This GUI is nammed lstopo.

I, David Guyon, made an update for this tool during my internship in the ALF Inria team in Rennes. With this update, the original static lstopo window becomes dynamic. Every second, the interface is refreshed to display performance information such as the cache load and the ratio of cache misses over cache accesses in every cache box, the CPU load in the PU boxes and also the number of Instructions Per Cycle (IPC). It uses performance counters through system calls and values from the /proc folder to get these values.

lstopo - Before/After

How to install

Before you start

Make sure you have the autoreconf package (provided by autoconf) and a C/C++ compiler like gcc installed in your system. The GUI uses the Cairo environment. You'll know if it's installed during the configuration. You will also need the root permissions if you want to make a system-wide installation.

Compilation

Quick steps

  • autoreconf
  • mkdir build and cd build
  • ../configure (make sure to get yes at the line Graphical output)
  • make

In details

In the root folder, run the autoreconf command to regenerate the configure file and the Makefiles depending on your system.

Then make a build directory, mkdir build, because we like doing clean compilation.

In this new directory, type ../configure. This is a bit verbose and at the end you will see the compatibility with your system. You can see in my output that it supports Cairo. If not, install it and do the configuration again.

Hwloc optional build support status (more details can be found above):

Probe / display I/O devices: PCI(pciaccess+linux)
Graphical output (Cairo):    yes
XML input / output:          full
libnuma memory support:      yes
Plugin support:              no

Finally, run the make command to do the compilation.

You can now run the application with ./utils/lstopo. See the How to use section for more information.

System-wide installation

You probably want to install this application in your system to be able to run it with the lstopo command.

Simply run sudo make install.

You can now run the application with the lstopo command. See the How to use section for more information.

How to use

There are two way to use this update : system-wide or per pid.

System-wide

Using the system-wide performance analysis is, in my opinion, the easiest way to use the application. However, you will need to have the root permissions to run it.

sudo lstopo --no-io --no-legend

or

sudo ./utils/lstopo --no-io --no-legend

The option flags I put in the command line are implemented by the original lstopo. They are optionals but I like to use them because it removes the information we do not need for the analysis.

Per pid

If you want to analyse a specific process or just because you do not have root permissions in your system, use the --pid option. No need for sudo when this option is set but it requires that the process pid exists and if it terminates, the application will close instantly.

lstopo --pid <pid>

or

./utils/lstopo --pid <pid>

Tips

To get the pid quickly: ./my-awesome-pgr & echo $!

Last note

Do not hesitate to use the brand new --refresh option to control the refresh period. The default value is set to 1000ms and you can force this value to a minimum of 50ms.

Tests

The following screenshots show the analysis result for a program using OpenMP to handle parallelization. The left one is running with a number of 2 threads (one on each physical core) and the right one is running with 4 threads (one on each logical core). Note the higher IPC value (yellow) when there are only 2 logical cores. Indeed, running a program with 4 threads is not two times more efficient than 2 threads as we could expect.

OpenMP with 2 logical cores and 4 logical cores

The bellow two screenshots show another problem caused by denormal numbers. The program I used for the test is ramsurf. When you compile it with the -funsafe-math-optimizations flag, the IPC value is high (around 2.5 Instructions Per Cycle with my machine). Without this flag, the compiled program runs with a very low IPC (around 0.05 Instructions Per Cycle). As you can see, it is not because of cache misses. The reasons of this behaviour are the floating point calculations.

Ramsurf with optimization and without optimization

Licence

This work in under a BSD licence (3-clause). The list of contributors is in hwloc-1.9/COPYING. Here's its content:

Copyright © 2009 CNRS
Copyright © 2009 inria.  All rights reserved.
Copyright © 2009 Université Bordeaux 1
Copyright © 2009 Cisco Systems, Inc.  All rights reserved.
Copyright © 2012 Blue Brain Project, EPFL. All rights reserved.
Copyright © 2014 David Guyon. All rights reserved.
See COPYING in top-level directory.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
   notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
   notice, this list of conditions and the following disclaimer in the
   documentation and/or other materials provided with the distribution.
3. The name of the author may not be used to endorse or promote products
   derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.