The n_top_timers feature of TimerReport outputs timings from the root process only to stdout, greatly confusing people. It should instead output average (or min/max) timings over all processes.
One common problem is a routine that contains communication, and which thus has to wait until all processes arrive there. If previous routines show a load imbalance, then outputting timings only from the root process make this routine look very slow, although this routine really only has to wait for other processes finish their previous calculations. Since this is a common case, it makes the current timing output useless for all routines involving communication.