Updates to Grackle interface to work with the new Grackle 3.0 api and functionality (volumetric and specific heating rate fields, radiative transfer fields, and self shielding approximation switch)
Hi @aemerick , I attempted to run the test suite on this PR and it immediately seg faults for all test problems in the test suite. I am confident that it has something to do with PR. When I try the simplest possible test problem with make grackle-yes for enzo-dev with your PR pulled in (at changeset 6260f1c) , a 1D Sod shock tube, I get this seg fault:
./enzo.exe -d SodShockTube.enzo
MPI_Init: NumberOfProcessors = 1
MEM Call initialize :: mean/min/max = 12509184/12509184/12509184
[dev-intel16:13436] *** Process received signal ***
[dev-intel16:13436] Signal: Segmentation fault (11)
[dev-intel16:13436] Signal code: Address not mapped (1)
[dev-intel16:13436] Failing at address: (nil)
[dev-intel16:13436] [ 0] /lib64/libpthread.so.0(+0xf710) [0x2b4aad674710]
[dev-intel16:13436] [ 1] ./enzo.exe(_Z22SetDefaultGlobalValuesR11TopGridData+0x124b) [0x67e98b]
[dev-intel16:13436] [ 2] ./enzo.exe(_Z13InitializeNewPcR14HierarchyEntryR11TopGridDataR16ExternalBoundaryPd+0x50) [0x622d90]
[dev-intel16:13436] [ 3] ./enzo.exe(main+0x660) [0x447860]
[dev-intel16:13436] [ 4] /lib64/libc.so.6(__libc_start_main+0xfd) [0x2b4aad8a1d5d]
[dev-intel16:13436] [ 5] ./enzo.exe() [0x446e79]
[dev-intel16:13436] *** End of error message ***
Segmentation fault (core dumped)
Interestingly, this seems to happen in SetDefaultGlobalValues. When I recompile with make grackle-no (and no other changes), the Sod shock tube runs as expected. For the record, I'm running on Linux with the gcc 4.4.7 compiler stack, HDF5 1.8.7, and OpenMPI 1.4.3. Would you mind seeing if you can reproduce this problem? I'd be happy to catch up with you online to chat at some point as well.
I'm also going to tap @brittonsmith to see if he has any thoughts on what's going on here.
Hmm. Interesting. I could very well have done something silly in copying over the code from the version that works in the fork that I actively develop in to my other enzo-dev fork. I should be able to track down the issue in the next day or so.
Roger that, thanks. Please let me know if I can give you more info!
Silly question: did you update grackle too?
Reasonable question. Yes, grackle was updated to the current tip of the repo, so I'm compiling against grackle-3.0. This particular problem got me a few days back when I was getting my whole setup updated so I could run the test suite again (a laptop update screwed everything up and I didn't have time to sort it out for a while), and @brittonsmith very quickly set me straight.
I tracked down the issue. The seg fault is triggered on line 492 of SetDefaultGlobalValues:
Everything runs fine after removing this if-statement. And this is unsurprisingly because grackle_data isn't instantiated until set_default_chemistry_value is called. And logically doing this check doesn't make sense since ReadParameterFile hasn't been called yet at this point to check if in fact use_grackle == True.
It would be nice to be able to check whether or not Grackle is being used first, however, as some of the Grackle defaults are different than the Enzo defaults, which would cause problems if compiling with Grackle and running with use_grackle = 0. In fact, Enzo crashes doing this when metal cooling is on because the defaults for the temperature tabulation, TemperatureEnd, are different. However, this would require two passes through the parameter file, first checking the value of use_grackle, calling SetDefaultGlobalValues, then going through the parameter file again to overwrite any defaults.
So this check should just be removed, but maybe we should add in a catch somewhere to provide a more useful error message to alert the user when they've compiled with Grackle and are trying to use the native Enzo metal cooling methods.
Couldn't you just move the check after the call to set_default_chemistry_value? You're already in an #ifdef USE_GRACKLE block, so grackle should be available to fill out the grackle_data struct.
My general point was the grackle_data->use_grackle check is pointless regardless. If its put after set_default_chemistry_value it will just hold the value default which isn't useful. It would be more useful to know if the user wants grackle_data->use_grackle set to True first, before overwriting the Enzo defaults with the Grackle defaults, but this would require doing something a little fancier in both SetDefaultGlobalValues and ReadParameterFile.
Ah, I see. Up to you if you want to do that fancy thing, since we're not doing it right now it would be a strict improvement over the status quo but isn't necessary to get this PR merged.
I second Nathan's point. We were already overriding regardless if Enzo is compiled with USE_GRACKLE, so removing this just returns to what we're doing now. When I implemented this the first time around, this was the best I could come up with for this same reason that doing something more would be a pain.
I just ran the AMRCosmology_Grackle simulation from the run directory and it looks as it did with grackle-2.x. If the test suite looks good, I think this is good to go.
Quick update on the test suite: I am seeing some behaviors that I think I understand, and others that are baffling me. I'll detail them below, and would appreciate some feedback.
In CoolingTest_Grackle, I'm seeing that there are small differences in the Cooling_Time quantity, at the ~75% level in one of the cells (the other cells are equal). The output times are also slightly different. Has Grackle been changed enough from v2.2 to v3.0 that this would be unsurprising? Similarly, in OneZoneFreefallTest basically everything is slightly different, but that also uses Grackle and physically evolves, so it's not shocking that things are different if something in Grackle has changed. @aemerick @brittonsmith, can you comment on this?
There are a variety of other tests that are different and I do NOT understand why. For example, AdiabaticExpansion, MHDAdiabaticExpansion_CT, MHDAdiabaticExpansion_Dedner, MHDZeldovichPancake, FreeExpansion, and a couple of others have velocity values that range from slightly different to wildly different, and there's clearly something wrong. Most of these shouldn't use any of the code that this PR touches, so I need to figure out what's going on here. I'm going to dig into it and will report back shortly.
I believe there were enough slight modifications between v2.2 and v3.0 in Grackle to explain slight variations in CoolingTest_Grackle and OneZoneFreeFallTest, but I cannot say for sure (for example, there were updates and bugfixes related to the case A and case B recombination rates). However, the differences in the rest of the unrelated test problems is very strange. Is this definitely related to this PR? or maybe worth an additional test running with make grackle-no (though I'm not sure how useful that is since Grackle shouldn't affect it either way)?
This could be related to the differences in default parameters between Enzo and Grackle that are being overwritten when compiling with Grackle, regardless as to whether or not Grackle is actually being used. For example, Enzo defaults with RadiativeCooling = FALSE while Grackle forces it to True (this would certainly cause wild differences where there shouldn't be any).
Additional default differences: CIECooling is set to 1 in Enzo, 0 in Grackle, H2OpticalDepthApproximation is 1 in Enzo, 0 in Grackle. I know these two used to hold the Enzo defaults in v2.2, but now hold their different defaults in v3.0. But I'm not sure about the RadiativeCooling parameter.
Oh interesting, I hadn't thought of that - I'll try a second run with RadiativeCooling = False, since there are HUGE differences for some of the runs (like six orders of magnitude in velocity).
Edit: I meant make grackle-no, not RadiativeCooling = False
Yeah, I'm actually suspicious that it has to do with the machine where I'm running the test suite - there have been some weird file system issues on that machine, and I've had a bit of a problem with non-reproducible errors.
Following up on my own message here: I ran the tests on two different machines (our local MSU supercomputer and my Mac laptop) and find two different outcomes. I suspect there's something funny going on with our local supercomputer (there is some evidence for this based on other users' recent experience), and the errors do not necessarily seem to be reproducible, so I'm going to focus on my laptop's results.
On my laptop, almost all of the tests pass. The two test problems that have issues are CoolingTest_Grackle and OneZoneFreefallTest. CoolingTest_Grackle throws a couple of errors because various values of the cooling time differ between the gold standard (generated with the tip of enzo-dev and Grackle-2.2) and this PR (generated with Grackle-3). About 2/3 of the cooling times are different at the end of the simulation, which vary by between a very tiny fraction of the value and about a factor of two (and mostly seem to occur at higher temperatures). My understanding is that enough changes have been made in Grackle that this should not be too surprising. There's also a slight difference in the output time for the data output being tested (at the ~1e-8 level), but that doesn't concern me.
OneZoneFreefallTest has a variety of differences, with some of the cells having approximately a factor of two difference in temperature between the two runs and a ~0.1% difference in value in density between the two runs, both at the final simulation data output. (Various projected quantities show differences as well, but they're weighted by density so it's unsurprising that this is happening.) Given my understanding of the test problem's functionality, these relatively minor differences are to be expected.
A full paste of the differences between the gold standard and this PR, as reported by the test suite, are at http://paste.yt-project.org/show/6955/ . @brittonsmith , could you comment on whether these outputs seem reasonable to you? If they do, I'm inclined to approve this PR.
This grackle PR notably changed the answers for grackle's freefall test.
Oh, nice - thanks for pointing that out. Yep, I bet it's related. :-)
Hey all, I'm inclined to think that @ngoldbaum is right and the answers will have just changed slightly. I think the only way to be certain is to run these two manually and inspect the output. I'll do that right now and report back.
I've run these two tests manually. I'll show a comparison below, but I think the differences are due to the various reaction rate changes that have happened in Grackle between v2.2 and v3.0.
There is a pretty significant temperature difference, but I recall this is what we saw in Grackle as well.
Thanks for digging into this, @brittonsmith . As you said, it's pretty much what was reported in the Grackle tests. I think we're good to go at this point - I'm going to hi 'Approve'. One more approve and we merge. Maybe @ngoldbaum ?