Improve McLachlan accuracy

Issue #963 open
Erik Schnetter created an issue

James van Meter provided me with an optimised version of McLachlan. He states:

  1. You are not taking full advantage of the chi=exp(-2phi) variable. There are several terms you divide by chi or chi^2 in expressions with overall factors exp(-4phi). I rewrote the BSSN equations to make these cancellations before coding. So where you have an expression of the form chi^2(A+B/chi^2), I have chi^2A+B. This gives a slight but noticeable advantage in both accuracy and performance.

  2. I added Hamiltonian-constraint-damping terms due to Duez et al. These terms don't seem to be well-known but they are effective.

  3. I added a Gamma-constraint-damping term due to Yo et al.

  4. I enforce det(g)=1.

I have not tested this new version yet, but James suggests to include it into the Einstein Toolkit.


Comments (22)

  1. Roland Haas
    • removed comment

    This seems to be based on 3e4bb765aaff1776ef3da6840d55b4c417875803 "Regenerate ML_ADM" which is from 2011-11-20. So my first questions would be if we could get a patch against a more modern version of McLachlan (ideally one including CCZ4 since it also modifies the equations quite a bit). The second comment (this is turning into a paper review) is that in enforceCalc the file does:

    phi -> Max[phi,10^(-16)], which seems wrong for the phi method where phi is the log of the conformal factor and might well be negative. Finally I would find it very useful if comments a la PRD XX YYYYY (ZZZZ) eqn EE were added for the constraint damping terms C1-C4. Since the source is the only documentation for ML it would be nice if all the references required to even understand where the equations come from were contained in it,

  2. Peter Diener
    • removed comment

    I have just added a patch towards the development version. I have tested that all the testsuites passes using this version and ran a modified version of the qc0-mclachlan.par parameter for 5M and tested that then old and the new version gives the same results both when running using the standard phi- and W-methods. However, I noticed that the RHS1 routine is more expensive now. With the repository version I get (from the TimerReport):

    ML_BSSN | ML_BSSN_RHS1 | 224.50275900 |

    while with the new version I get:

    ML_BSSN | ML_BSSN_RHS1 | 425.99931700 |

    while the timings for all other McLachlan routines are essentially unchanged. I currently can't explain this difference in time. It doesn't seem like there is enough new calcuations added to warrant such an increase. I will investigate further.

  3. Erik Schnetter reporter
    • removed comment

    This could be a "platform effect". A small change in the code size or the number of local variables could now mean that the code runs essentially using the L2 cache instead of the L1 cache on your system, dramatically changing performance. If so, then this effect may or may not be there on other systems or with other compilers.

  4. Peter Diener
    • removed comment

    I have run both versions of the code with PAPI hardware performance timers on my laptop (Intel Core i7 chips) using the Intel compilers. I see a clear increase in L1 instruction cache misses (24 times more in the Goddard version). However, I also see a factor of 2 increase in floating point instructions as well as a factor 4 increase in L1 data cache misses. I'm not sure if these other increases are caused by the instruction cache misses. We may not see the same on AMD machines, but a fairly significant fraction of HPC resources are based Intel chips, so I'm a bit hesitant applying the patch at the moment.

  5. Erik Schnetter reporter
    • removed comment

    I am currently working on a re-write of McLachlan in a branch "rewrite". This is a general clean-up of the code, including removing duplication etc. From my point of view, it would make sense to apply this improvement to the rewritten McLachlan to avoid divergent branches.

  6. Peter Diener
    • removed comment

    One comment and a question:

    Comment: We have to make extensive test and ensure that performance is not degraded by these changes.

    Question: Does this rewrite also include the gauge splitting stuff from Ticket #590? Or will that have to be considered separately afterwards?

  7. Erik Schnetter reporter
    • removed comment

    Comment: Yes; we need to check not only the performance, but also correctness. After the rewrite, it will be easy to generate several versions, and then pick (maybe even at run time?) the most efficient one for each platform.

    Question: The gauge splitting is currently not done, but will be just a few lines of work in the new code. I'll give an overview of the design tomorrow morning in the ET call.

  8. Ian Hinder
    • changed status to open
    • removed comment

    I think Erik is going to test/implement these changes in the new version of McLachan (the rewrite branch). As a result, I am removing the "review" status from this ticket.

  9. Erik Schnetter reporter
    • removed comment

    I want to rename some parameters in the rewritten version. Given this, I suggest to stick with the master for the next release.

  10. Roland Haas

    Is there any desire to continue working on this? From this ticket the functionality seems all there with minor changes required (and it possibly no longer applying to the rewritten ML code)?

  11. Zach Etienne

    @Peter Diener mentioned during this morning’s ET telecon he would like to take a look at this, but hasn’t had time yet.

  12. Zach Etienne

    The “Duez et al” trick is quite powerful – I implemented this trick in Baikal a few months ago and found H constraint violations drop by about an order of magnitude compared to when it’s not enabled, for binary black hole simulations (e.g., a ET BBH gallery example - like simulation).

    The trick dates back to Yoneda & Shinkai , who did an analytical (non-numerical) investigation

    … and Duez et al I think were the first to report its use in NR simulations:

    In addition Raithel & Paschalidis recently found that this trick improves H constraint convergence behavior in the context of binary neutron star simulations:

    Future versions of Baikal/BaikalVacuum will have this trick enabled by default – I’m now calibrating the magnitude and scaling of the coefficient.

  13. Log in to comment