Wiki

Profiling code

Introduction

Wondering why your code is slow? Use a profiler! You can spend a lot of time optimizing the wrong area otherwise. Profilers run your program (or a subset of it) while frequently checking what exactly the program is doing, and when they're done, they spit out a report that tells you where all your time is going.

In Python, the main profiler is the cProfile module.

Quick-start guide

Let's say we want to figure out why level generation is taking so long in Pyrel. We therefore want to profile the GameMap.makeLevel function. This function is called from two places: when the player takes a staircase, and when the wizard-mode "jump to level" command is used. For simplicity we'll modify the latter.

We open up commands/wizard.py, and add import cProfile at the top. Then we find this line:

self.gameMap.makeLevel(self.newLevel)

And we change it to say this instead:

cProfile.runctx("self.gameMap.makeLevel(self.newLevel)", locals(), globals(), "profiling.pro")

Then we run Pyrel, use the jump-to-level command once, and exit.

Now there's a new file named "profiling.pro" in Pyrel's root directory. This is cProfile's profile log; a big glob of binary that you can't easily read yourself. However, the pstats module can read it, and Pyrel includes a module, meta/profile.py to do just that. So you run this:

python meta/profile.py profiling.pro | less

And you get a big dump of the profiling information that cProfile collected. Here's the top few lines from one such run:

Fri Dec 21 21:56:33 2012    profiling.pro

         2515817 function calls (2468548 primitive calls) in 2.909 seconds

   Ordered by: internal time
   List reduced from 152 to 100 due to restriction <100>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   185698    0.231    0.000    0.231    0.000 container.py:55(__init__)
    43699    0.218    0.000    0.402    0.000 gameMap.py:165(addSubscriber)
95168/47911    0.213    0.000    0.549    0.000 gameMap.py:209(getContainer)
        1    0.184    0.184    0.357    0.357 gameMap.py:80(makeCellArray)
   431469    0.169    0.000    0.169    0.000 generationMap.py:257(isInBounds)
    47257    0.133    0.000    0.770    0.000 genUtility.py:25(clearCell)
       15    0.133    0.009    0.313    0.021 connection.py:387(getAStarTunnel)
    96584    0.114    0.000    0.114    0.000 {method 'add' of 'set' objects}
        1    0.106    0.106    0.127    0.127 generationMap.py:12(__init__)
       17    0.095    0.006    1.544    0.091 genUtility.py:110(makeRectangleFilled)

This tells us that overall, it took us about 3 seconds to make the level. Then it spits out a big list of functions, ordered by "internal time" (the "tottime" column). This ordering tells us how long we spent in each function, while not in functions that that function called. For example, when we called GameMap.makeCellArray, we spent a lot of time making new Cells, one for each tile in the new map. That time is not reflected in the internal time, which is .184s in this case. It is, however, reflected in the cumulative time ("cumtime") column, which is .357s.

Usually cumulative time will tell you where you want to focus, but it can throw a lot of wrong functions at you (e.g. the function is simply high on the stack, so a lot happens underneath it), and total time can help you narrow things down. Here's the cumulative-time printout (which is in the report generated by meta/profile.py if you scroll down a long way):

   Ordered by: cumulative time
   List reduced from 152 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    2.909    2.909 <string>:1(<module>)
        1    0.019    0.019    2.909    2.909 gameMap.py:102(makeLevel)
        1    0.001    0.001    2.335    2.335 generator.py:108(makeAngbandLevel)
       17    0.095    0.006    1.544    0.091 genUtility.py:110(makeRectangleFilled)
    47257    0.133    0.000    0.770    0.000 genUtility.py:25(clearCell)
        1    0.005    0.005    0.576    0.576 connection.py:47(connectRooms)
95168/47911    0.213    0.000    0.549    0.000 gameMap.py:209(getContainer)
    43699    0.218    0.000    0.402    0.000 gameMap.py:165(addSubscriber)
    46418    0.083    0.000    0.361    0.000 generationMap.py:248(setGridInfo)

Of course, the top few lines are just going down the function-call stack to the function that starts doing all the real work, makeAngbandLevel. That's not so informative. What is informative, however, is that we spent almost half our time in makeRectangleFilled, even though it was only called 17 times. That's an excellent spot to start optimizing; if we can cut that down by an order of magnitude, then we'll have halved mapgen time. clearCell might be another good spot; it depends on if that function is called by makeRectangleFilled or not (and thus whether or not it's represented in that function's cumulative time). And so on.

Miscellaneous Notes

The reason we only ran the wizard-mode command once is because each time you do cProfile.runctx, it overwrites the old log file. There's no way (that I know of) to append multiple profiling logs together. If you want to do many iterations and profile them together to get more statistical significance, then you'll have to write a bit of Python code that does those iterations, and then profile that.

There's no point in profiling code that spends any time waiting on user input; you'll see all of the "top offenders" be the code that waits for an event to happen. Try to set up your profiling run so it starts profiling immediately before the bit that you want to diagnose, and stops immediately afterwards. That way you don't get distracted by going down wrong paths.