pypy / pypy / doc / discussion / jit-profiler.rst

Full commit

A JIT-aware profiler

Goal: have a profiler which is aware of the PyPy JIT and which shows which percentage of the time have been spent in which loops.

Long term goal: integrate the data collected by the profiler with the jitviewer.

The idea is record an event in the PYPYLOG everytime we enter and exit a loop or a bridge.

Expected output

[100] {jit-profile-enter loop1 # e.g. an entry bridge [101] jit-profile-enter} ... [200] {jit-profile-enter loop0 # JUMP from loop1 to loop0 [201] jit-profile-enter} ... [500] {jit-profile-exit loop0 # e.g. because of a failing guard [501] jit-profile-exit}

In this example, the exiting from loop1 is implicit because we are entering loop0. So, we spent 200-100=100 ticks in the entry bridge, and 500-200=300 ticks in the actual loop.

What to do about "inner" bridges?

"Inner bridges" are those bridges which jump back to the loop where they originate from. There are two possible ways of dealing with them:

  1. we ignore them: we record when we enter the loop, but not when we jump to a compiled inner bridge. The exit event will be recorded only in case of a non-compiled guard failure or a JUMP to another loop
  2. we record the enter/exit of each inner bridge

The disadvantage of solution (2) is that there are certain loops which takes bridges at everty single iteration. So, in this case we would record a huge number of events, possibly adding a lot of overhead and thus making the profiled data useless.

Detecting the enter to/exit from a loop

Ways to enter:

  • just after the tracing/compilation
  • from the interpreter, if the loop has already been compiled
  • from another loop, via a JUMP operation
  • from a hot guard failure (which we ignore, in case we choose solution (1) above)
  • XXX: am I missing anything?

Ways to exit:

  • guard failure (entering blackhole)
  • guard failure (jumping to a bridge) (ignored in case of solution (1))
  • jump to another loop
  • XXX: am I missing anything?

About call_assembler: I think that at the beginning, we should just ignore call_assembler: the time spent inside the call will be accounted to the loop calling it.