Exact repeat case doesn't reproduce results
Some related discussion on issue #62
An identical repeat nonlinear case, running on the same system, on the same number of processors and with the exact same executable and input file give quite different time traces (qualitatively similar features though). Whilst we might anticipate a small difference between identical repeats due to potentially slightly different floating point operation order from the MPI I would typically expect the curves to be very similar over a large duration and historically I believe this was the case.
It would be good to explore this in more detail to try to identify the source of this difference. I will attach an input file and more details in the future.
For reference this was done with b020059aba41639940ce5a76d0380b008a04e4ad
Comments (11)
-
reporter -
reporter The above image shows 9 identical repeat cases run on one processor -- clearly the results are similar but not identical. The next step is to repeat these cases using
FFTW_ESTIMATE
(unfortunately this involves recompiling so the executable will no longer be identical to the other cases). -
reporter Brief update: It looks like forcing
FFTW_ESTIMATE
in all the plans has removed this run-to-run variation entirely (on one core). I plan on producing a PR (for utils) that ensures we can control this choice in all cases and defaults to the reproducible option (estimate vs plan/measure). -
reporter The above shows a comparison of 9 identical repeat cases on one processor with a patch to gs2 to ensure that FFTW_ESTIMATE is used for all fft plan creations. As the figure illustrates these runs are producing identical results. I'll also explore the nproc > 1 behaviour but now believe this task is essentially resolved with a followup task to ensure FFTW_ESTIMATE can be selected at runtime for all plans to aid reproducibility.
-
It would be useful to note this in the documentation somewhere and to comment on the performance difference in that same place. Have you noticed much of a drop in performance when using FFTW_ESTIMATE?
-
Also, isn't there a way around this with the current GS2 setup using FFTW's wisdom? I've seen mention of wisdom in GS2 but I've never intentionally used it and I don't know how it works in GS2.
-
reporter The performance difference between plan and estimate is not particularly noticeable here but this is likely to vary between problem, machine, library version etc. so I'd want to do a much more detailed study to be able to say anything concrete without doing a lot more. Using estimate actually substantially speeds up the cost of creating the fft plans so it's possible that this saving could beat any small increase to the cost of each fft.
The wisdom could help make things a little more reproducible but this is not necessarily that reliable (you have to make sure fftw reads the right wisdom file for a start, which may be non-trivial when doing repeat runs).
-
reporter Another update -- this case running on a large number of processors also shows identical results between identical repeat runs when using FFTW_ESTIMATE.
-
What happens if you run this case with xyles vs lexys, both using FFTW_ESTIMATE?
-
reporter This should be addressed with https://bitbucket.org/gyrokinetics/utils/pull-requests/24/bugfix-ensure-consistent-fft-measure/diff although currently the default is still in favour of non-reproducible runs. I'd propose that we change the default in
gs2_layouts
so that we are reproducible by default. -
reporter - changed status to resolved
Fixed in release 8.0.2
The default is still non-reproducible runs, but the flag should at least allow this to be changed.
- Log in to comment
One thought is to explore the impact of the FFTW planning flags -- currently I think most plans will default to using
FFTW_PATIENT
orFFTW_MEASURE
. With these flags the fft result is non-deterministic in the sense that the exact algorithm selected by fft may well vary between identical repeat runs due to small fluctuations in initial timing data. For example see http://www.fftw.org/faq/section3.html#nondeterministicAn initial task is therefore to check the behaviour between repeat runs when FFTW_ESTIMATE is used instead.