Timer output does not add up

Issue #39 new
Stephen Biggs-Fox created an issue

I have just received the following timer output:

Eigensolver                0.00 min     0.0 %  


 Initialization            22.38 min     5.0 %  
 Advance steps            383.57 min    85.9 % 
(redistribute              85.425 min   19.1 %)
(field solve                5.128 min    1.1 %) 
(diagnostics               20.817 min    4.7 %) 
 Re-initialize            322.85 min    72.3 % 
 Finishing                  0.00 min     0.0 %  
 total from timer is:   446.45 min

As I understand it, the items in brackets are a breakdown of the item before the brackets (Advance steps). Therefore, I would expect the sum of the items in brackets to equal the item before the brackets. However, this is not the case:

85.425 + 5.128 + 20.817 = 111.370 which is not equal to 383.57

Furthermore, I would expect the sum of the items not in brackets to equal the total. However, this is not the case either:

22.38 + 383.57 + 322.85 = 728.80 which is not equal to 446.45

If one were to include Re-initialize in Advance steps (i.e. assume there should be brackets around Re-initialize), then we get closer but the numbers still do not match:

85.425 + 5.128 + 20.817 + 322.85 = 434.220 which is not equal to 383.57

and:

22.38 + 383.57 = 405.95 which is not equal to 446.45

On the plus side, the total is correct (or nearly correct) as I have the time taken from an external measurement as 07:26:39, while 446.45 mins = 07:26:27 so that agrees within ~10 seconds, which is fine.

Comments (13)

  1. David Dickinson

    It's just a partial breakdown -- the left over is what is spent in doing things like the linear advance, collisions, the non-linear term etc.

  2. Joseph Parker

    Just to add, the current implementation stores timings for a given operation in a length-2 vector. These vectors are either global variables (e.g. time_reinit) or members of the timers class, which is itself a member of state (e.g. state%timers%init). The latter approach is probably more desirable. However, this is a pain to implement as the subroutine calls must be changed to pass state through to the functions where lower level timers would be called. For example, if I want to time collisions, I have to call timing functions either side of vspace_derivative, so I need to pass state to advance, advance_<fields_implementation> and timeadv, so it's available to the timing subroutine.

    Moreover, it's questionable whether timers or state are stored in the right module, as trying to pass state down results in circular dependencies. Maybe state could stay where it is, but timers could be moved to gs2_time?

  3. Stephen Biggs-Fox reporter

    Even if it's a partial breakdown it's wrong because the sum is more than the total!

    Timings are potentially useful information but the current output is confusing and does not inspire much trust in the numbers it's telling. I think this should be rectified. It doesn't need to be perfect. It doesn't need to add up exactly. But I think it should be made clearer what things sum to make something else and the sum should agree (at least approximately) with the (sub-)total. And if there's other stuff that's not included, this should be captured in a category called 'other' or similar so that the sum and total match to make it clear that there is other stuff going on.

    Moving timers to gs2_time sounds sensible.

  4. Stephen Biggs-Fox reporter

    Also, I just got this from a run with nstep = 0:

     Initialization            26.32 min    99.6 %
     Advance steps              0.00 min     0.0 %
    (redistribute               6.610 min   25.0 %)
    (field solve                0.000 min    0.0 %)
    (diagnostics                0.000 min    0.0 %)
     Re-initialize              0.00 min     0.0 %
     Finishing                  0.00 min     0.0 %
     total from timer is:    26.42 min
    

    Notice re-distribute, which is listed as a contribution to advance steps, is non-zero even though advance steps is zero (because nstep = 0). This is because there are some re-distributes during initialisation. I find this misleading too.

  5. Joseph Parker

    Thanks Steve. You're right, these times are reported as if they are logically nested, but actually they are all independent timers. Timers are both started and stopped by callingtime_message. There's no check that timers don't overlap, or that timers have been started without being stopped.

    I've sketched out a possible improvement in the branch features/timings. The idea is to make a timers class in gs2_time. These would be for the big steps, like "Initialization" and "Advance Step". Each class would also contain a subtimers class, which would be for things like "redistribute", "field solve" and "diagnostics". This would let us know whether the redistribute is being called in initialization or time advance.

    This sort of works, but it looks like it'll be a pain to set it up with functions in utils (like redistributions) without breaking non-gs2 code. Also I still need to look at making sure timers are started and stopped at sensible times.

  6. Stephen Biggs-Fox reporter

    Brilliant! Sounds like a good fix so long as we can avoid breaking related projects. Thanks :)

  7. Stephen Biggs-Fox reporter

    OK, thanks.

    I've tried it out, as discussed below. NB: I have not looked at the implementation at all - only the output I get from running it.

    I get the following output:

    Initialization           0:00:44.68   100.0 %
       advance                0:00:00.00     0.0 %
       redistribute           0:00:04.27     9.6 %
       field solve            0:00:00.00     0.0 %
       collisions             0:00:00.00     0.0 %
       nonlinear term         0:00:00.00     0.0 %
       diagnostics            0:00:00.82     1.8 %
       reinitialize           0:00:00.00     0.0 %
       finishing              0:00:00.00     0.0 %
     Advance                  0:01:09.89   100.0 %
       advance                0:00:35.83    51.3 %
       redistribute           0:00:04.27     6.1 %
       field solve            0:00:08.00    11.5 %
       collisions             0:00:00.01     0.0 %
       nonlinear term         0:00:13.56    19.4 %
       diagnostics            0:00:09.26    13.2 %
       reinitialize           0:00:00.00     0.0 %
       finishing              0:00:34.02    48.7 %
     Finishing                0:00:00.00     NaN %
       advance                0:00:00.00     NaN %
       redistribute           0:00:04.27     Inf %
       field solve            0:00:00.00     NaN %
       collisions             0:00:00.00     NaN %
       nonlinear term         0:00:00.00     NaN %
       diagnostics            0:00:00.00     NaN %
       reinitialize           0:00:00.00     NaN %
       finishing              0:00:00.00     NaN %
     Total                    0:00:43.86   100.0 %
       advance                0:00:00.00     0.0 %
       redistribute           0:00:04.27     9.7 %
       field solve            0:00:00.00     0.0 %
       collisions             0:00:00.00     0.0 %
       nonlinear term         0:00:00.00     0.0 %
       diagnostics            0:00:00.00     0.0 %
       reinitialize           0:00:00.00     0.0 %
       finishing              0:00:00.00     0.0 %
    

    I like the layout as it indicates more clearly what is nested within something else.

    However, there are some issues with this:

    • I would expect all of the top level times (initialization, advance, and finishing) to be nested within the total so that their percentage is the percentage of the total (currently they are all showing 100%)

    • Furthermore, I would expect the sub-items under total to be sums of the corresponding sub-items from each of the other top level timers (e.g. total redist = init redist + advance redist + finishing re-dist) - this is currently not the case.

    • The total time does not agree with an external timer nor does the sum of the top level timers agree with an external timer (this was the case previously).

    • I would expect the sum of the times of the sub-items to be the time of the top level item. I would expect the sum of the percentages of the sub-items to be 100%. Neither of these is currently the case. Do we need to add other timers, e.g. to time communication? Or just a catch-all category called 'other' or similar to mop up any un-timed computation?

    • Finishing has a time of zero so all the sub-item percentages are NaN or Inf. However, the redistribute sub-item is non-zero so I think the root of the problem is that the top level item has not correctly summed the sub-items, similar to the above issue.

    On the plus side, the sum of the sub-item percentages multiplied by the top level item time is equal to the sum of the sub-item times so there is at least some agreement there.

    Finally, writing out these issues makes me realise that we are effectively hard-coding our own implementation of profiling into gs2; this makes me worry that we could be slowing gs2 down unnecessarily for users who are not interested in this timing information. This has got me thinking...

    Would we be better off testing whether we get any speed-up by removing the timers altogether? If not, then fine, we carry on fixing the timers. But if we do get speed-up, then maybe we're best off removing the timers and saying that anyone interested in profiling gs2 should use an external profiling tool. How about that?

  8. Joseph Parker

    Thanks for the feedback Steve. Like the idiot I am, I forgot to push the final commit... The current head of the branch should fix "finishing" problem, and mostly fix the sub-levels. You're right about how the totals should be displayed - I'll fix that.

    I'll also look at the performance without timers. I'd be surprised if it's a big deal, and I think that people would want some indication of timing without doing external profiling... but we'll see how big an effect it is.

  9. Log in to comment