I was investigating a simulation that ended up with negative total energy in grid::CorrectForRefinedFluxes. (The fact that it had negative total energy is an issue unrelated to this pull request.) I found that the results I received were dependent on the ordering of the fields in the BaryonField array. I believe the results should be independent of the order in which the fields are stored.
What happened was that the negative total energy was recognized as "bad", so the correction to total energy was undone and all further corrections were set to zero. However, there was nothing done to corrections on previous fields. Therefore, any field that was stored before total energy in BaryonField would keep its correction, while total energy and any field after total energy would have no correction. Changing the position of a field from before total energy to after total energy in BaryonField (or the reverse) would change the results.
A second issue I had in my case was that the velocity correction was much larger than the pre-correction velocity. When a large number was subtracted from a small number, then added back again, the result was not equal to the original small number: v_tiny - v_huge + v_huge != v_tiny.
The current PR addresses the first issue by undoing the current correction and all previous corrections, in addition to setting all future corrections to zero (for the offending cell only). The second issue is addressed by processing density, total energy, and internal energy before the other fields, thereby reducing the number of fields that would be affected by an "undo". This does not provide a perfect fix for issue #2, but it seemed better than nothing.
@dcollins4096 mentioned this issue has come up before, so perhaps somebody has ideas on a different solution.
It also reports similar "CFRFl/r warn" messages later in the simulation.
The above warnings mean that grid::CorrectForRefinedFluxes is catching a negative (or zero) value in BaryonField (internal energy in this simulation), in cell (7, 5, 6). The flux correction for dimension 1 (i.e. y-direction) is causing the issue, on both the left and right sides.
Since this PR is changing how the code handles a negative (or zero) internal energy, the results from this PR should be different from the previous code after this condition occurs.
Hi @james_larrue, this is great that you've found a possible fix for this long-standing problem in flux correction. I will test this out in one of my cosmological simulations that suffers from this bug and let you know the outcome soon.
John, just following up: did this affect your crashing sim?
It looks pretty good to me. I do think that the boundary flux needs to be reverted as well for a completely consistent revert, but since this machinery breaks conservation anyhow (by design) I don't stand by that statement with much conviction right now.
I'm definitely interested in what @jwise77 thinks after he's tested it.
Since @dcollins4096 is 85% sure that the boundary flux needs to be reverted and since I am inclined to agree with him, I suggest I implement the boundary flux reversion before we merge. I'll try to do it this week.
Sounds good, @james_larrue . I'll keep an eye out for the updated PR.
I didn't get to it last week, but it is still on my to-do list.
@james_larrue , it seems that this hasn't been done yet. Mind updating the PR?
As per my comment on 2015-04-07, this should not be done. As far as I know, the PR is ready to go.
It appears that the boundary flux is only modified if CorrectLeft/RightBoundaryFlux == true, while the BaryonField is only modified if CorrectLeft/RightBaryonField == true, and these two booleans are never the same value. The reverting code is only executed when the BaryonField is modified.
Therefore, it does not seem that the check for negative density/energy is required for the boundary flux. I modified the comments, but left the code as it was. I think it is ready for merging now.
This is the same error as before; it's expected. @dcollins4096 and @jwise77 , could you please take a look at this PR?
@dcollins4096 and @jwise77, could you please take a look at this PR? I would like to get it finished and off of my task list.
As per my comment on 2015-04-07, I do not believe the boundary fluxes need to be reverted as they will never be modified when the reverting occurs.
I think it's good. I think it needs testing, since this bit of code is deep in the evolution and problems here can be hard to track down.
@bwoshea, you had a test that failed with the hot-spots, can you run an A-B comparison of this PR and the appropriate ancestor?
I agree with your comment on 2015-04-07 that the boundary flux is irrelevant.
I will give this a shot as soon as I get back from vacation - in about a week.
Just to follow up on this: the same tests are failing as were before (http://srs.slac.stanford.edu/hudson/job/enzo-dev/244/testReport/). As @james_larrue pointed out, this is to be expected. I had tried to test this problem, but my hydro errors largely went away after previous fixes to the code. Does anybody have a hydro run that is currently failing with hot spots?
I can try running one of my galaxies with second order interpolation. I switched to first order interpolation a while back and that completely fixed my hot spot crashes.
That would be very helpful. Thank you!
A couple of weeks ago, I tested this on a simulation (32^3 AMR adiabatic EOS) that had some "CFRFxx" warning messages. I remember that they had hotspots before, but I think this was resolved in previous commits, like what @bwoshea mentioned. Those warning messages are still there, but I think this is showing that the subling / grid flux correction is working when there's negative energy or density. I compared the two simulations, which had some differences but nothing above the ~1e-3 level.
Here's a notebook that I used to compare four different runs.
The parent changeset before this PR (two runs to account for any run-to-run differences), this fix, and the current tip 5d66537. I think it all looks good. I ran this simulation from z=4 to z=3.5 with 8 top-level timesteps. Here are the number of CFRF warnings in each run.
for i in `echo fix-7c4dd95 parent-46c7166 parent-46c7166-run2 tip-5d66537`;doecho$i; grep -c CFRF $i/estd.out;done
This is very convincing to me. I think that I'm ready to accept this PR, pending @ngoldbaum 's feedback on his simulation.
I just tried to merge this into the local development head I've been using for my galaxy feedback simulations. Unfortunately things have diverged sufficiently in this head (mostly due to debug statements that got committed months ago while John and I were trying to debug crashes) to make the job of merging in this PR particularly nasty. I don't think I'll have time to test this one out anytime soon, unfortunately...
I'm happy if others are.
Three approvals. Do we merge, @bwoshea ? I'm ready.
yep, time to pull the trigger on this one. I'll merge momentarily.