Exact repeat case doesn't reproduce results

Issue #69 resolved
David Dickinson created an issue

Some related discussion on issue #62

An identical repeat nonlinear case, running on the same system, on the same number of processors and with the exact same executable and input file give quite different time traces (qualitatively similar features though). Whilst we might anticipate a small difference between identical repeats due to potentially slightly different floating point operation order from the MPI I would typically expect the curves to be very similar over a large duration and historically I believe this was the case.

It would be good to explore this in more detail to try to identify the source of this difference. I will attach an input file and more details in the future.

For reference this was done with b020059aba41639940ce5a76d0380b008a04e4ad

Comments (11)

  1. David Dickinson reporter

    One thought is to explore the impact of the FFTW planning flags -- currently I think most plans will default to using FFTW_PATIENT or FFTW_MEASURE. With these flags the fft result is non-deterministic in the sense that the exact algorithm selected by fft may well vary between identical repeat runs due to small fluctuations in initial timing data. For example see http://www.fftw.org/faq/section3.html#nondeterministic

    An initial task is therefore to check the behaviour between repeat runs when FFTW_ESTIMATE is used instead.

  2. David Dickinson reporter

    nproc_1_comparison.png

    The above image shows 9 identical repeat cases run on one processor -- clearly the results are similar but not identical. The next step is to repeat these cases using FFTW_ESTIMATE (unfortunately this involves recompiling so the executable will no longer be identical to the other cases).

  3. David Dickinson reporter

    Brief update: It looks like forcing FFTW_ESTIMATE in all the plans has removed this run-to-run variation entirely (on one core). I plan on producing a PR (for utils) that ensures we can control this choice in all cases and defaults to the reproducible option (estimate vs plan/measure).

  4. David Dickinson reporter

    nproc_1_fft_estimate_comparison.png

    The above shows a comparison of 9 identical repeat cases on one processor with a patch to gs2 to ensure that FFTW_ESTIMATE is used for all fft plan creations. As the figure illustrates these runs are producing identical results. I'll also explore the nproc > 1 behaviour but now believe this task is essentially resolved with a followup task to ensure FFTW_ESTIMATE can be selected at runtime for all plans to aid reproducibility.

  5. Stephen Biggs-Fox

    It would be useful to note this in the documentation somewhere and to comment on the performance difference in that same place. Have you noticed much of a drop in performance when using FFTW_ESTIMATE?

  6. Stephen Biggs-Fox

    Also, isn't there a way around this with the current GS2 setup using FFTW's wisdom? I've seen mention of wisdom in GS2 but I've never intentionally used it and I don't know how it works in GS2.

  7. David Dickinson reporter

    The performance difference between plan and estimate is not particularly noticeable here but this is likely to vary between problem, machine, library version etc. so I'd want to do a much more detailed study to be able to say anything concrete without doing a lot more. Using estimate actually substantially speeds up the cost of creating the fft plans so it's possible that this saving could beat any small increase to the cost of each fft.

    The wisdom could help make things a little more reproducible but this is not necessarily that reliable (you have to make sure fftw reads the right wisdom file for a start, which may be non-trivial when doing repeat runs).

  8. David Dickinson reporter

    Another update -- this case running on a large number of processors also shows identical results between identical repeat runs when using FFTW_ESTIMATE.

  9. David Dickinson reporter

    Fixed in release 8.0.2

    The default is still non-reproducible runs, but the flag should at least allow this to be changed.

  10. Log in to comment