We have added support for simple, lightweight, timing and performance
measurements. This allows one to examine which functions are using the majority
of the simulation runtime, and how this varies across multiple processors. We
have built in a number of default timers, such as EvolveLevel for each level,
RebuildHierarchy, SolveHydroEquations, and Group_WriteAllData. Below, we will
outline how to add additional timers and how to generate plots of the data.
At each cycle, information is printed out to a file named performance.out.
By default, for each timing section the times spent on each processor are
collected, and the mean, standard deviation, minimum, and maximum values
are presented. This is meant to give the user a sense of how well
load-balanced their simulation is across processors, as well as pinpoint
where the majority of the time is being spent. To explain the output, we show
an example cycle:
| --------------------------example performance.out-----------------------------
| Cycle_Number 2
| Level_0 6.520748e-05 8.344650e-07 6.389618e-05 6.604195e-05 100 4 3.833916e+05
| Level_1 3.254414e-05 2.804866e-05 1.406670e-05 8.106232e-05 10 1 7.681875e+04
| Level_2 1.159906e-04 2.678922e-05 9.965897e-05 1.623631e-04 14 1 3.017485e+04
| Level_3 2.477765e-04 7.348677e-05 2.028942e-04 3.750324e-04 16 1 1.614358e+04
| Level_4 5.816817e-04 1.630557e-04 4.820824e-04 8.640289e-04 24 1 1.031492e+04
| Level_5 1.266718e-03 3.594168e-04 1.056910e-03 1.889229e-03 26 1 5.131371e+03
| Level_6 2.686501e-03 7.197988e-04 2.262831e-03 3.933191e-03 40 1 3.722315e+03
| RebuildHierarchy 5.715549e-03 1.371242e-04 5.478144e-03 5.801201e-03
| SolveHydroEquations 1.436710e-03 2.407243e-03 4.386902e-05 5.606174e-03
| Total 1.499003e-02 3.440975e-05 1.494408e-02 1.503992e-02 230 10 3.835882e+03
| Each of the Level_N and Total lines have:
| Level_N, mean time, stddev time, min time, max time, number of cells,
| number of grids, mean cells/s/processor
| Each non-level line (RebuildHierarchy, SolveHydroEquations, etc.) have:
| Section Name, mean time, stddev time, min time, max time.
Time is measured in seconds of wall time for each of the processors.
In the example above, we see that more time is being spent in RebuildHierarchy
than in SolveHydroEquations, and that the load balance is quite poor for the
SolveHydroEquations where the mean is 1.4 ms, with a standard deviation of
At the beginning of each simulation (on Cycle 1), we print out a header to the
| # This file contains timing information
| # For instructions on how to decipher this information,
| # see http://enzo-project.org/docs/somewhere
| # Times are collected across MPI processes and presented as:
| # Level_N/Total, Mean Time, Stddev Time, Min Time, Max Time, Num Cells,
| # Num Grids, Cells/processor/sec
| # Routine, Mean Time, Stddev Time, Min Time, Max Time
Then, at the start of each simulation (whether the beginning or a restart), we
print out the MPI processor count:
# Starting performance log. MPI processes: 4
This is done in case the number of processors changes over time.
Adding New Timers
While there are a number of default timers, it is easy to add new timers to any
section of code in Enzo.
The built-in timers include: EvolveHierarchy (Total), EvolveLevel (for each
level), SolveHydroEquations, RebuildHierarchy, and Group_WriteAllData. Adding
new times should be as simple as doing two things:
1) Add #include "EnzoTiming.h" to the top of the file you want to profile,
making sure it is before macros_and_parameters.
| around the code you want to time.
The string that you pass in gets collected in a map which is then iterated over
at the end of each evolve hierarchy. At that time it prints into a file named
performance_tools.py is a python module for plotting the performance
information stored in performance.out. The easiest way to generate plots
from performance.out is to call performance_tools.py from the command line:
| $ python performance_tools.py performance.out
| $ python performance_tools.py -s 11 performance.out
to do the same while applying a smoothing kernel to your data 11 cycles in
By default, performance_tools.py will output 6 plots:
Plot the mean time taken per processor on each level and on the
simulation as a whole (Total) versus cycle number. Overplot in
lighter tones are the minimum and maximum time taken on a processor
for each of these quantities.
Same as p1.png except scale everything to be as a fraction of the
total time taken.
Plot the mean time taken per processor on each level versus cycle number.
Stack each level on the previous layer cumulatively.
Plot the mean time taken per processor performing any Non-Level fields versus
cycle number (e.g. the RebuildHiearchy, SolveHydroEquations, and
Group_WriteAllData tasks). Stack each level on the previous layer
cumulatively. Scale everything to be as a fraction of the total time taken.
Plot the number of cells generated at each level versus cycle number and
stack them cumulatively.
Plot the efficiency (cells/processor/sec) for each level and for
the simulation as a whole versus cycle number.
Generating Additional Plots
If you want to create additional plots of your data beyond the defaults,
simply add new plot_quantities() and plot_stack() calls to the bottom of
This can be as simple as adding one of these lines:
| $ p.plot_quantity("Level 0", "Mean Time")
| $ p.plot_stack("Level 0", "Mean Time")
| $ p.plot_quantity(, "Mean Time", repeated_field="All")
| $ p.plot_stack(, "Num Cells", repeated_field="Level")
| $ p.plot_quantity("Total", "Mean Time", repeated_field="Non-Level")
Full documentation for the plot_quantity and plot_stack functions can
be found in the docstrings for the performance_tools.py module. You can
view it either by looking at the source code, or by loading it in python:
| $ python
| >>> import performance_tools as pt
| >>> help(pt.perform)