Results no longer dependent on order of fields in BaryonField[] when negative energy/density found by grid::CorrectForRefinedFluxes

#266 Merged at 1fe57c1
  1. James Larrue

I was investigating a simulation that ended up with negative total energy in grid::CorrectForRefinedFluxes. (The fact that it had negative total energy is an issue unrelated to this pull request.) I found that the results I received were dependent on the ordering of the fields in the BaryonField[] array. I believe the results should be independent of the order in which the fields are stored.

What happened was that the negative total energy was recognized as "bad", so the correction to total energy was undone and all further corrections were set to zero. However, there was nothing done to corrections on previous fields. Therefore, any field that was stored before total energy in BaryonField[] would keep its correction, while total energy and any field after total energy would have no correction. Changing the position of a field from before total energy to after total energy in BaryonField[] (or the reverse) would change the results.

A second issue I had in my case was that the velocity correction was much larger than the pre-correction velocity. When a large number was subtracted from a small number, then added back again, the result was not equal to the original small number: v_tiny - v_huge + v_huge != v_tiny.

The current PR addresses the first issue by undoing the current correction and all previous corrections, in addition to setting all future corrections to zero (for the offending cell only). The second issue is addressed by processing density, total energy, and internal energy before the other fields, thereby reducing the number of fields that would be affected by an "undo". This does not provide a perfect fix for issue #2, but it seemed better than nothing.

@dcollins4096 mentioned this issue has come up before, so perhaps somebody has ideas on a different solution.

Comments (27)

    1. James Larrue author

      The Cosmology/SphericalInfall test should give different results.

      At cycle 188, the log file (estd.out) says:

      P(0) -- CFRFl warn: 1.211703e-13 1.104539e-14 1.862144e-10 -1.862033e-10 7 5 6 1 [2]
      P(0) -- CFRFr warn: 1.211703e-13 -1.099658e-14 -1.777313e-10 -1.777203e-10 7 5 6 1 [2]


      It also reports similar "CFRFl/r warn" messages later in the simulation.

      The above warnings mean that grid::CorrectForRefinedFluxes is catching a negative (or zero) value in BaryonField[2] (internal energy in this simulation), in cell (7, 5, 6). The flux correction for dimension 1 (i.e. y-direction) is causing the issue, on both the left and right sides.

      Since this PR is changing how the code handles a negative (or zero) internal energy, the results from this PR should be different from the previous code after this condition occurs.

  1. John Wise

    Hi @james_larrue, this is great that you've found a possible fix for this long-standing problem in flux correction. I will test this out in one of my cosmological simulations that suffers from this bug and let you know the outcome soon.

  2. dcollins4096

    It looks pretty good to me. I do think that the boundary flux needs to be reverted as well for a completely consistent revert, but since this machinery breaks conservation anyhow (by design) I don't stand by that statement with much conviction right now.

    I'm definitely interested in what @jwise77 thinks after he's tested it.

    1. James Larrue author

      Since @dcollins4096 is 85% sure that the boundary flux needs to be reverted and since I am inclined to agree with him, I suggest I implement the boundary flux reversion before we merge. I'll try to do it this week.

          1. James Larrue author

            As per my comment on 2015-04-07, this should not be done. As far as I know, the PR is ready to go.

    2. James Larrue author

      It appears that the boundary flux is only modified if CorrectLeft/RightBoundaryFlux == true, while the BaryonField is only modified if CorrectLeft/RightBaryonField == true, and these two booleans are never the same value. The reverting code is only executed when the BaryonField is modified.

      Therefore, it does not seem that the check for negative density/energy is required for the boundary flux. I modified the comments, but left the code as it was. I think it is ready for merging now.

    1. Sam Skillman

      This is after updating the gold results, so nothing here has really changed from above. This will just require another new gold standard if/when it is pulled in.

    1. Brian OShea

      This is the same error as before; it's expected. @dcollins4096 and @jwise77 , could you please take a look at this PR?

      1. James Larrue author

        @dcollins4096 and @jwise77, could you please take a look at this PR? I would like to get it finished and off of my task list.

        As per my comment on 2015-04-07, I do not believe the boundary fluxes need to be reverted as they will never be modified when the reverting occurs.

        1. dcollins4096

          I think it's good. I think it needs testing, since this bit of code is deep in the evolution and problems here can be hard to track down.
          @bwoshea, you had a test that failed with the hot-spots, can you run an A-B comparison of this PR and the appropriate ancestor?
          I agree with your comment on 2015-04-07 that the boundary flux is irrelevant.

    1. Nathan Goldbaum

      I can try running one of my galaxies with second order interpolation. I switched to first order interpolation a while back and that completely fixed my hot spot crashes.

    2. John Wise

      A couple of weeks ago, I tested this on a simulation (32^3 AMR adiabatic EOS) that had some "CFRFxx" warning messages. I remember that they had hotspots before, but I think this was resolved in previous commits, like what @bwoshea mentioned. Those warning messages are still there, but I think this is showing that the subling / grid flux correction is working when there's negative energy or density. I compared the two simulations, which had some differences but nothing above the ~1e-3 level.

      1. John Wise

        Here's a notebook that I used to compare four different runs.

        The parent changeset before this PR (two runs to account for any run-to-run differences), this fix, and the current tip 5d66537. I think it all looks good. I ran this simulation from z=4 to z=3.5 with 8 top-level timesteps. Here are the number of CFRF warnings in each run.

        for i in `echo fix-7c4dd95 parent-46c7166 parent-46c7166-run2 tip-5d66537`; do echo $i; grep -c CFRF $i/estd.out; done
        1. Brian OShea

          This is very convincing to me. I think that I'm ready to accept this PR, pending @ngoldbaum 's feedback on his simulation.

          1. Nathan Goldbaum

            I just tried to merge this into the local development head I've been using for my galaxy feedback simulations. Unfortunately things have diverged sufficiently in this head (mostly due to debug statements that got committed months ago while John and I were trying to debug crashes) to make the job of merging in this PR particularly nasty. I don't think I'll have time to test this one out anytime soon, unfortunately...