the patch was tested with a code where different functions are called on different processes and the number of timers are different across all the mpi processes. the timing results are gathered when the timer name is the same while appended when different names are found. that is what I would expect the timerreport to work in such a case.
This fails for me when there are timers that were destroyed (via CCTK_TimerDestroy by WaveExtractL from Llama) with a segfault in line 677 (the strncpy). The error is assuming that timer never go away. Apparently Cactus never reduces its number of timer counter though the actual timer index seems to be re-used by the underlying Util_DeleteHandle/Util_NewHandle. These two will free the associated timer structure (and the name). So CCTK_NumTimers() seems to be the number of times CCTK_TimerCreate was successfully called ie. the number of timers ever created, not the number of timers currently in existence and one always has to check if a timer id in a `for(i=0;i<CCTK_NumTimers();i++)` loop is still valid.
Additionally it seems as if the large automatic arrays in line 700
which tries to create an automatic array of size CCTK_nProc()*CCTK_NumTimer()*TIMERNAME_LENGTH is also causing problems (since I have a test runs without a call to TimerDestroy which fails in line 704 "name_displacements = 0" with a segfault.)
In my runs I have nprocs >= 192 (576 cores, 6 threads), NumTimers >= 2500 (hydro, multipatch) and TIMERNAME_LENGTH = 101. The total size of this array is thus about 48MB. If this is ever tried to be allocated on the stack by a simple minded compiler then this should segfault as well (and I have no idea if a compiler is required to support arbitrarily large automatic arrays).
If this ever causes a segfault then a better solution might be to rewrite the CollectTimerInfo routine in question in C++ and use vector<T> for memory allocation since there are still a number of arrays that have automatic storage class and whose size depends on nProcs so can also grow uncontrolled.
If you want, you can keep the type of all_timernames; there is no need to change from char to char. You would write
char (*restrict const all_timernames)[TIMERNAME_LENGTH] = malloc(total_ntimers * sizeof *all_timernames);
which would save the explicit index calculations, and would avoid the need to cast the type for compare_string_array. However, this is not necessary to apply the patch.
I committed version 2 of the patch as TimerReport rev 45 which restores the char type. I had not realized that one could get C to handle allocatable 2d arrays in this fashion. Thanks for pointing this out.
I don't understand why deletion of timers is allowed by Cactus. What does it mean? At the end of a run, you want to know how much time has been spent in each timer. Deleted timers make this impossible. Unless there is some compelling argument, maybe we should consider deprecating CCTK_TimerDestroy at some point in the future.
Another issue is the assumption that timer names are only 100 characters long (this might have been due to me). Carpet's hierarchical timers are implemented as Cactus timers with names based on their path. i.e. main/Evolve/.../myfn. These could in principle get quite long. I am unhappy about silently truncating the timer names. I think it would be better to allow arbitrarily long timer names in TimerReport.
We sprung memory leaks. Some were in this code. Attached please find a patch. Normally I would simply fix these since they are obvious (and valgrind agrees with me that they are leaks) but given that we are soft-frozen...