Is this a bug? Different results but only minor differences in code version and input file - I would have expected exactly the same.

Issue #92 resolved
Stephen Biggs-Fox created an issue

Not sure if this is a bug yet - requires further investigation. Just thought I would post it here in case anyone else is aware of similar behaviour.

When I run my input file (attached) with 8.0.2 I get different results (also attached) from when I run with my branch (feature/scale-zf-and-turb-restart - which is next minus one, plus a few commits). The input files are effectively the same apart from different nstep and margin_cpu_time (should not affect results), removal of opt_redist_init (does not exist in newer code, should not affect results), and addition of include_explicit_source_in_restart = .false. (this is part of what I’m testing - should not affect results) and write_final_moments = .true. (just because I think I needed it for something, can’t remember - regardless, should not affect results).

My guess is that the difference is due to something that changed between 8.0.2 and next minus one (fa532283) but I have not yet had a chance to look into this. Also, one run was on Viking (York Uni cluster) while the other was on Archer so another possibility is different library versions. I plan to dig deeper tomorrow (Wed 15th Jan 2020).

So, before I do that, is anyone aware of what might be causing this? Thanks

Comments (19)

  1. David Dickinson

    The turbulent plot looks reasonable to me – the exact non-linear state achieved will be sensitive to different initial conditions that you will get running on different machines/core counts. The zonal plot looks more substantially different. It might be worth holding more things fixed – either the same version on the two machines or different versions on the same machine (later probably better as you’ll then also be using the same library versions as well). Do these runs change the timestep at any point, if so what does that look like for the two cases?

  2. Stephen Biggs-Fox reporter

    “the exact non-linear state achieved will be sensitive to different initial conditions that you will get running on different machines/core counts”

    Good point - I did not think of that.

    Neither are restarts; both are initial runs.

    I will make things more similar, re-test and report back here…

  3. Stephen Biggs-Fox reporter

    Timesteps do change in all 4 runs. They look similar but not exactly the same. Nothing obviously wrong here. Plots below.

    The runs shown so far used 8.0.2 on Viking (and my branch on Archer). I now have 8.0.2 runs queued on Archer, i.e. different code (8.0.2 vs my branch), same machine (Archer). I will report back later…

  4. David Dickinson

    Thanks, I now realise that the original two plots were showing four cases – I originally thought it was the zonal and non-zonal components of just two cases.

  5. Stephen Biggs-Fox reporter

    Indeed. Now that you know that, would you say both sets of results compare in a way that is probably OK?

  6. David Dickinson

    I’m still nervous about the zonal one. If we assume they are saturated they have gone to quite different magnitudes, outside of what I would have expected.

  7. Stephen Biggs-Fox reporter

    OK - I have done some testing with fft_measure_plan = .false. to ensure reproducibility. I have tested 3 version: 8.0.2, fa532283 (next minus one; the point where I branched off from), and feature/scale-zf-and-turb-restart (my branch).

    Each branch can reproduce its own results, so that’s a good start at least!

    The results from fa532283 match those from feature/scale-zf-and-turb-restart. This suggests that my branch has not introduced anything untoward. However, the 8.0.2 results differ. This suggests that a change between 8.0.2 and fa532283 has introduced this difference.

    I am currently re-running 8.0.2 just to double check that it’s not a library version or compiler flag issue, though I would guess this is unlikely.

    For my current task, this is good enough. I know my branch hasn’t introduced any unexpected changes. If necessary, I can rebase my changes onto a different starting point in due course. However, someone might want to look into what has changed between 8.0.2 and fa532283 to introduce this difference to check that it is expected / acceptable.

  8. David Dickinson

    Thanks for the update. Could you provide the exact input files used in both cases (8.0.2 and fa532283). How many cores were you running on?

  9. Stephen Biggs-Fox reporter

    Just to confirm, the re-run with 8.0.2 is now complete. The new 8.0.2 binary definitely has the same library versions and compiler options as the fa532283 binary (that was the point of the re-run!). The new 8.0.2 binary gives the same result as I the old 8.0.2 binary. This indicates that the observed difference is definitely not due to library versions or compiler options. Therefore, as I suggested previously, someone might want to look into what has changed between 8.0.2 and fa532283 to introduce this difference to check that it is expected / acceptable.

  10. David Dickinson

    Thanks, not sure if you saw my post a few minutes ago but are you able to provide the exact input files used in each case?

  11. Stephen Biggs-Fox reporter

    Input file in both cases is as attached above except with the following changes:

    93,94c93,94
    <   avail_cpu_time = 86400
    <   margin_cpu_time = 600
    ---
    >   avail_cpu_time = 10800
    >   margin_cpu_time = 300
    108a109
    >   fft_measure_plan = .false.
    131c132
    <   tprim = 1.6
    ---
    >   tprim = 2.5
    150d150
    <   include_explicit_source_in_restart = .false.
    

    Running on Archer with standard library set-up (i.e. as specified in GS2 makefile), compiled as follows:

    make -j distclean
    make -j USE_NEW_DIAG= gs2
    

    Running on 592 cores.

  12. David Dickinson

    Could you please attach the actual files just to avoid any risk of us using the wrong flags?

  13. Stephen Biggs-Fox reporter

    Done. I have replaced the input.in that I had attached originally with the version actually used in the most recent tests with 8.0.2 and fa532283. If you download and use the attachment you should (hopefully) get the same results as me. A quick way to check is ncdump -v phi2 input.out.nc | tail and look at the last value. For 8.0.2 I get 26.9940736447918 and for fa532283 I get 36.3602414177723.

  14. David Dickinson

    Thanks that’s great. We’ll try to dig into this to see if we can understand what’s the source is.

  15. Stephen Biggs-Fox reporter

    I should say as well that nstep is 25000, avail_cpu_time is 10800 (3 hours) and on 592 cores these runs were completing all 25000 steps in around 2.5 hours (min = 02:14:00, max = 02:39:16) so a walltime request of 3 hours is ideal.

  16. Stephen Biggs-Fox reporter

    FYI - David and I are dealing with this via email now as that is easier for sharing input files.

  17. David Dickinson

    Added seed input in 8.1 to help fix the initial random number generator state. Further investigation of this case suggests at least two possible saturated states available and small differences can push from one to the other.

  18. Log in to comment