Make Carpet output the global load imbalance, i.e. the total amount of time a process is spending in MPI_Waitall etc. Make this easy to display or graph, and produce one-line output that is easy to understand.
The rationale is: determining load imbalance per routine leads to a lot of data. If most of the time is spend in a few routines, then it is very likely that this global load imbalance information will be useful.