Is this a bug? Different results but only minor differences in code version and input file - I would have expected exactly the same.
Not sure if this is a bug yet - requires further investigation. Just thought I would post it here in case anyone else is aware of similar behaviour.
When I run my input file (attached) with 8.0.2 I get different results (also attached) from when I run with my branch (feature/scale-zf-and-turb-restart - which is next minus one, plus a few commits). The input files are effectively the same apart from different nstep and margin_cpu_time (should not affect results), removal of opt_redist_init (does not exist in newer code, should not affect results), and addition of include_explicit_source_in_restart = .false. (this is part of what I’m testing - should not affect results) and write_final_moments = .true. (just because I think I needed it for something, can’t remember - regardless, should not affect results).
My guess is that the difference is due to something that changed between 8.0.2 and next minus one (fa532283) but I have not yet had a chance to look into this. Also, one run was on Viking (York Uni cluster) while the other was on Archer so another possibility is different library versions. I plan to dig deeper tomorrow (Wed 15th Jan 2020).
So, before I do that, is anyone aware of what might be causing this? Thanks
Comments (19)
-
-
Are you restarting either of these cases in the data shown or are they a single run?
-
reporter “the exact non-linear state achieved will be sensitive to different initial conditions that you will get running on different machines/core counts”
Good point - I did not think of that.
Neither are restarts; both are initial runs.
I will make things more similar, re-test and report back here…
-
reporter Timesteps do change in all 4 runs. They look similar but not exactly the same. Nothing obviously wrong here. Plots below.
The runs shown so far used 8.0.2 on Viking (and my branch on Archer). I now have 8.0.2 runs queued on Archer, i.e. different code (8.0.2 vs my branch), same machine (Archer). I will report back later…
-
Thanks, I now realise that the original two plots were showing four cases – I originally thought it was the zonal and non-zonal components of just two cases.
-
reporter Indeed. Now that you know that, would you say both sets of results compare in a way that is probably OK?
-
I’m still nervous about the zonal one. If we assume they are saturated they have gone to quite different magnitudes, outside of what I would have expected.
-
reporter OK - I have done some testing with
fft_measure_plan = .false.
to ensure reproducibility. I have tested 3 version: 8.0.2, fa532283 (next minus one; the point where I branched off from), and feature/scale-zf-and-turb-restart (my branch).Each branch can reproduce its own results, so that’s a good start at least!
The results from fa532283 match those from feature/scale-zf-and-turb-restart. This suggests that my branch has not introduced anything untoward. However, the 8.0.2 results differ. This suggests that a change between 8.0.2 and fa532283 has introduced this difference.
I am currently re-running 8.0.2 just to double check that it’s not a library version or compiler flag issue, though I would guess this is unlikely.
For my current task, this is good enough. I know my branch hasn’t introduced any unexpected changes. If necessary, I can rebase my changes onto a different starting point in due course. However, someone might want to look into what has changed between 8.0.2 and fa532283 to introduce this difference to check that it is expected / acceptable.
-
Thanks for the update. Could you provide the exact input files used in both cases (8.0.2 and fa532283). How many cores were you running on?
-
reporter Just to confirm, the re-run with 8.0.2 is now complete. The new 8.0.2 binary definitely has the same library versions and compiler options as the fa532283 binary (that was the point of the re-run!). The new 8.0.2 binary gives the same result as I the old 8.0.2 binary. This indicates that the observed difference is definitely not due to library versions or compiler options. Therefore, as I suggested previously, someone might want to look into what has changed between 8.0.2 and fa532283 to introduce this difference to check that it is expected / acceptable.
-
Thanks, not sure if you saw my post a few minutes ago but are you able to provide the exact input files used in each case?
-
reporter Input file in both cases is as attached above except with the following changes:
93,94c93,94 < avail_cpu_time = 86400 < margin_cpu_time = 600 --- > avail_cpu_time = 10800 > margin_cpu_time = 300 108a109 > fft_measure_plan = .false. 131c132 < tprim = 1.6 --- > tprim = 2.5 150d150 < include_explicit_source_in_restart = .false.
Running on Archer with standard library set-up (i.e. as specified in GS2 makefile), compiled as follows:
make -j distclean make -j USE_NEW_DIAG= gs2
Running on 592 cores.
-
Could you please attach the actual files just to avoid any risk of us using the wrong flags?
-
reporter - attached input.in
Updated input file
-
reporter Done. I have replaced the input.in that I had attached originally with the version actually used in the most recent tests with 8.0.2 and fa532283. If you download and use the attachment you should (hopefully) get the same results as me. A quick way to check is
ncdump -v phi2 input.out.nc | tail
and look at the last value. For 8.0.2 I get 26.9940736447918 and for fa532283 I get 36.3602414177723. -
Thanks that’s great. We’ll try to dig into this to see if we can understand what’s the source is.
-
reporter I should say as well that nstep is 25000, avail_cpu_time is 10800 (3 hours) and on 592 cores these runs were completing all 25000 steps in around 2.5 hours (min = 02:14:00, max = 02:39:16) so a walltime request of 3 hours is ideal.
-
reporter FYI - David and I are dealing with this via email now as that is easier for sharing input files.
-
- changed status to resolved
Added
seed
input in 8.1 to help fix the initial random number generator state. Further investigation of this case suggests at least two possible saturated states available and small differences can push from one to the other. - Log in to comment
The turbulent plot looks reasonable to me – the exact non-linear state achieved will be sensitive to different initial conditions that you will get running on different machines/core counts. The zonal plot looks more substantially different. It might be worth holding more things fixed – either the same version on the two machines or different versions on the same machine (later probably better as you’ll then also be using the same library versions as well). Do these runs change the timestep at any point, if so what does that look like for the two cases?