Multipole: Default (midpoint) integration method incorrectly implemented, yielding wrong results... especially with m=odd modes

Issue #2398 resolved
Zach Etienne created an issue

Multipole works by first interpolating (e.g., psi4) data onto a uniform grid in theta and phi, such that

theta = [0,pi], where theta = 0,dth,2 dth, …, N dth=pi

and

phi = [0, 2 pi], where phi = 0, dph, 2 dph, …, N dph=2 pi

(See https://bitbucket.org/einsteintoolkit/einsteinanalysis/src/26d17d9edfecafd0c516e38fe8343bdc1d11eb5d/Multipole/src/utils.cc#lines-322 )

Clearly these are not midpoints. Further the integration for midpoint method proceeds as (https://bitbucket.org/einsteintoolkit/einsteinanalysis/src/26d17d9edfecafd0c516e38fe8343bdc1d11eb5d/Multipole/src/integrate.cc#lines-51):

  for (iy = 0; iy <= ny; iy++)
    for (ix = 0; ix <= nx; ix++)
      integrand_sum += f[idx(ix,iy)];

  return hx * hy * integrand_sum;

which double counts points on the periodic boundaries. This might not be a problem for theta since the integrands naturally include a multiplication by sin(theta), and sin(0)=sin(pi)=0. However it does pose a major problem with phi, since there is no guarantee that the integrand at phi=0 or 2pi vanishes.

There is even a warning in code comments about this adding a first-order error to a method that’s “already first order”, while the trapezoidal rule is claimed to be “2nd order convergent” (param.ccl). When properly implemented the error order on the midpoint method matches that of the trapezoidal rule and in fact midpoint produces about half the error (see https://en.wikipedia.org/wiki/Riemann_sum#Midpoint_rule ; cf https://en.wikipedia.org/wiki/Riemann_sum#Trapezoidal_rule).

Despite the above problems, for some reason midpoint integration is chosen as the default.

I propose that we either error out if the user attempts to use the midpoint rule in its current form, or properly implement the midpoint method in the thorn.

Empirical effects of the bug: I find that certain odd-m modes extracted from QC-0 (e.g., l=2,m=1 or l=m=3) are of order 1% the amplitude the dominant (l=m=2) mode with the bug, but drop to 0.001% the amplitude when trapezoidal rule (a properly implemented integration method) is chosen.

Comments (28)

  1. Roland Haas

    I had a look at the source code as well, and in broad strokes agree with the assessment above. There may be two different issues going on. One for the theta direction, one for the phi direction.

    In the phi direction the functions integrated are periodic so any equidistant sampling of the [0,2pi] interval with unit weight given to the values at the sample point is a “midpoint rule” and, because of the periodicity converges exponentially well, since it is essentially the Fourier series. However the current code in the lines quoted in the description double counts the point phi=0=2pi namely one of the loops (not sure which one on top of my head) should only go to “< nx” (or “< ny”).

    In the theta direction there is no periodicity and a weight of sin(theta) shows up (due to the surface element on a sphere in the physicists usual spherical coordinates). A typical midpoint rule would have ntheta segments with sample points at pi/ntheta/2 + itheta * pi/ntheta where the offset is to make it a midpoint. The current code is lacking the offset as well as actually using ntheta+1 sample points a itheta * pi/ntheta. Sticking to an interpretation as a midpoint rule this means it includes cells centered on the poles that reach to negative theta. This would indicate a double counting this (negative theta) area in the integral. Since the weights of these segments are always zero they actually do not show up and instead the midpoint rule used ends up missing a bit of the integral (or being “open” instead of closed).

    Note that neither one of these affects the convergence order of the method (there’s a test for this in integration_convergence_order output to repos/einsteinanalysis/Multipole/test/test_22/test_midpoint_convergence_order..asc and I had checked the order as well using a Maple spreadsheet before proposing #2210).

    However it, as found here, affects the accuracy of the midpoint rule, which is (much) poorer than it ought to be.

    In summary it seems to me that the midpoint rule as implemented is no ideal and it would benefit from the improvements that seem to be suggested above, namely:

    1. use more traditional midpoints in theta: pi/ntheta/2 + itheta * pi/ntheta
    2. fix the double counting in phi

    A pull request which fixes these would be very welcome and most likely see favorable review very quickly.

    There is the second issue that it may be useful to not make the least accurate rule the default (the choice of midpoint I believe was done in the assumption that it would be most robust). Instead one could use either Driscoll-Healy or the 4th order convergent Simpson rule. It may be interesting to verify that all composite rules use a midpoint rule in the phi direction, which, because of periodicity, is actually the most accurate rule to use in the phi direction.

  2. Roland Haas

    My claim about this not affecting the convergence order seems incorrect. The measured order is ~1 while the trapezoidal rule, which should have the same order, is ~2 making this a much clearer bug.

  3. Zach Etienne reporter

    Comparison of different Multipole integration techniques for l=2,m=1 mode output for QC-0 run. The resolutions correspond to the [Ntheta x Nphi] resolution on the Multipole grid. Notice that the amplitude is normalized by the maximum amplitude for the dominant (l=m=2) mode. So the bug in the default integrator is causing a completely unnecessary ~1% error (!!!!)

  4. Zach Etienne reporter

    Regarding using Simpson’s or Driscoll-Healy as default – I’d suggest trapezoidal actually. I see additional oscillations in Simpson’s as compared to trapezoidal, and no real improvement of Driscoll-Healy over trapezoidal, for these modes at least.

  5. Roland Haas

    Have you checked whether the Simplson rule uses a midpoint rule for phi or not? The trapezoid one will, due to periodicity, end up being just a midpoint in phi. Driscoll-Healy does use midpoint for phi.

  6. Roland Haas

    I dug a bit deeper and it turns out the lower convergence order reported in repos/einsteinanalysis/Multipole/test/test_22/test_midpoint_convergence_order..asc is a bit of a red herring. The order is indeed lower for the coded up midpoint rule if given an arbitrary polynomial (which the code to compute order uses), however the physics code does not use an arbitrary polynomial but instead a function of the form f(theta) = sin(theta) g(theta) in which case (adjusting for double counting in phi) the convergence order of the coded up “midpoint” rule actually happens to be 2. This is because the code happens to implement a rule:

    int(f, x) = f(x_0)*dx/2 + sum(f(x_i)*dx,i=1..N-1)) + f(x_N)*dx/2
    

    ie a midpoint rule for the interval [x_{1/2},x_{N-1/2}] with just Riemann sums using lower / upper values for the endpoints x_0 and x_N (with incorrect weights but since the contribution is always 0 this does not matter). This also turns out to be exactly the formula for the trapezoidal rule in the theta direction.

    So, looking at this now, I would say:

    1. in the theta direction, given the currently fixed evaluation pointsx_i = i * pi/ntheta (used in other places of Multipole than just integration so possibly not so straightforward to change) the code does as good as it can. Depending on the point of view one takes it either implements a strange midpoint rule with Riemann sum endcaps (and incorrect weights) or the trapezoidal rule (with the same incorrect weights).
    2. integration in phi is off because it double counts the phi=0=2pi point. A fix there is to reduce the integration range to “< ny” (phi is “y”). This seems to be source of convergence failure.

    So, since in the phi direction, for periodic data, the trapezoidal rule and midpoint rule are identical and, as explained above, the “midpoint” rule implemented in the theta direction, for the case of a sin(theta) factor present, actually happens to be a trapezoidal rule, as well one may just as well make “midpoint” and alias for “trapezoidal” and remove the strange “midpoint” code.

    If one wanted to keep an actual “midpoint” rule (in theta) one needs to change the interpolation points and make sure that all parts of Multipole handle this case correctly.

    For illustration I attach a Maple worksheet that demonstrates the effect (where midpoint_1 is the correct bound for phi showing a convergence order of 2 and midpoint_2 is off by one and shows convergence order close to 1).

  7. Zach Etienne reporter

    Here is my proposed patch for the upcoming release branch, which replaces the default integration with trapezoidal. I checked the test/ directory, and all parfiles select the integration_method explicitly, so no patch should be needed for those.

    diff --git a/Multipole/param.ccl b/Multipole/param.ccl
    index b2ac27e5..8125e48c 100644
    --- a/Multipole/param.ccl
    +++ b/Multipole/param.ccl
    @@ -25,11 +25,11 @@ CCTK_STRING coord_system "What is the coord system?"
    
     KEYWORD integration_method "How to do surface integrals" STEERABLE=always
     {
    -  "midpoint"      :: "Midpoint rule (1st order)"
    +  "midpoint"      :: "Midpoint rule (1st order) <- WARNING: BROKEN; double-counts in phi, yielding lower-than-expected integration. See ticket 2398 for details."
       "Simpson"       :: "Simpson's rule (4th order) [requires even ntheta and nphi]"
       "trapezoidal"   :: "Trapezoidal rule (2nd order)"
       "DriscollHealy" :: "Driscoll & Healy (exponentially convergent) [requires even ntheta]"
    -} "midpoint"
    +} "trapezoidal"
    

  8. Roland Haas

    Summarizing part of the discussion above (and offline between @Zach Etienne and myself), the current “midpoint” method is incorrect / inaccurate in various ways.

    As far as I can tell, the closest one could get it to be correct, without changing the interpolation points, is to remove the double counting in “phi” which turns it into the “trapezoidal” rule. Hence changing the default to “trapezoid” is the least disruptive change before the release.

    @Zach Etienne has a patch / pull request that will provide actual “midpoint” rule to be merged into the release branch after the release once it has had time to mature a bit on master.

  9. Ian Hinder

    Apologies for (i) missing this conversation (I was on vacaction), and (ii) probably being the guilty party! I switched from midpoint to simpson a while ago because I noticed that this made a difference in some situations (possibly the same ones that Zach found). I didn’t want to change the default because this would cause old parameter files to give different results, and break backwards compatibility. The default was indeed chosen as it was less likely to give oscillations and be “more robust” - this was in the early days of successful BBH evolutions, and we didn’t want to touch anything in case the whole thing fell apart! I do recall being suspicious of the “midpoint” method, but don’t remember the details.

    I’d recommend against changing the default now, for the reason above. Instead, I would keep the default at “midpoint” and output a warning and a recommendation to use trapezoidal. From what I read above, the “midpoint” method gives physically correct (convergent) results, though it is less accurate than it could be (and isn’t the “midpoint rule”, hence is badly named). So it’s not a big problem if people don’t pay attention to the warning. I have had many experiences in the past where I have been trying to work out why one version of the code gives different results to another. If there are multiple such changes, it makes it even harder.

    If there is a plan to implement a “corrected” midpoint method (is there a reason to have one?), I would call it something different.

    See also #2282, referring to the gallery examples using the default low-accuracy method.

    (Currently frustrated as this is the second time I’ve typed this, having lost the form content due to accidentally clicking on a link on this page before sending. How did web browsers ever become acceptable as the primary way we interface with most software???)

  10. Zach Etienne reporter

    Hi @Ian Hinder . My patch for corrected midpoint is just above your message. Probably needs a bit more work ito documentation, but it does the job. @Roland Haas decided to change the default to trapezoidal for the release, since (1) this is not a midpoint method at all (as you know), (2) it produced very wrong results (~few% error on BBH waveform extraction, unnecessarily), and (3) the bug was uncovered too close to the release.

  11. Ian Hinder

    Since it’s been there since the code was written, I probably wouldn’t have made the change for the release. I wouldn’t say it’s producing “very wrong” results, especially for the modes which people usually look at. The results are convergent to the exact result, just with a larger than necessary coefficient. If someone was worried about the accuracy of their wave extraction (and a priori, everyone should be checking the accuracy of all approximations used in their simulations), they can easily either increase the resolution of the wave extraction grid, or change the method. So I don’t see it as a “critical bug that absolutely must be fixed for this release as soon as it is discovered”. And changing the default might cause more pain down the line. Anyway, it’s done now.

  12. Roland Haas

    The bug was reported before the deep freeze at which points bugs and fixes are still acceptable. I do hope that there will be a more complete fix soon that then can be backported to the release branch (as it should).

  13. Roland Haas

    @Zach Etienne will you fix the convergence test issue that remains even with your patch? That would be required to be able to backport to the release branch. As by the discussion above, the sooner this is done, the better this would be.

  14. Zach Etienne reporter

    @Roland Haas I just pushed a new revision. The midpoint method now demonstrates 2nd order convergence in the convergence test. I also updated the tests.cc file to properly use midpoints when midpoint integration is selected for the test_pi_symmetric_sphere_integral() function as well. All test results for tests that used the midpoint method were updated.

    No other integration methods were affected, and I demonstrated this by confirming their test results remain unchanged.

  15. Roland Haas

    Thank you. Comments:

    • please document the actual meaning of ntheta and nphi in param.ccl. The explanation there seems to have never been correct. From what I can tell there are always ntheta+1 points at which the field is evaluated with either ntheta cells (non-midpoint) or ntheta+1 cells (midpoint). For phi there are also nphi+1 points where the field is evaluated with the same nphi or nphi+1 cells.
    • I find the name “offset” for the variable used to create the grid somewhat unfortunate. Since it is used in two places: once to increate the number of cells when computing dth and once to actually offset by 1/2 cell width. Maybe better call it “is_midpoint” with the same 0/1 value (whether integer or double does not really matter).

    Looks fine to me otherwise. Would you mind creating an actual pull request so that it is possible to see all changes at once (well I will ignore the changes in ascii output files), please?

  16. Log in to comment