Restarting nonlinear simulations on archer gives NaNs

Issue #82 resolved
Ollie Beeke created an issue

I have run and restarted a nonlinear cyclone-base-case simulation on archer. Upon restarting, the first printed value of heat flux and potential matches the last step in the previous simulation. All subsequent values of potential are NaNs. I have tried a linear simulation with exactly the same input parameters and number of cores, and I do not get NaNs. I have attached the two input files that I used for the initial and restarted simulations.

Comments (8)

  1. David Dickinson

    Thanks for the report, what version (or commit hash) are you using here and what modules on Archer are you using? How many processors are you running with?

  2. David Dickinson

    I've reproduced this on another system with 32 cores and nx=ny=4 to speed things up a bit. This is using current next.

  3. Joseph Parker

    Setting nstep=0 in cbc_restart.in gives sensible values in the restart files, but setting nstep=1 gives nans.

  4. Ollie Beeke reporter

    @David Dickinson the latest commit I see from git log is f7c09ab. I used 864 cores, although I realise now that I should have used 432 as I included only one species (though I doubt that is would cause the nans!). The module list is shown below:

  5. David Dickinson

    @Ollie Beeke great, thanks. Joseph and I have reproduced this independently. Could you tell me the output of ncdump -v delt2 nc/restart.nc.0?

  6. David Dickinson

    Could you try the branch provided in PR #201 (or apply the changes there to your case) this seems to have fixed the issue in my small reproducer. This boils down to a copy-paste error which meant we saved the oldest timestep in the wrong variable and restored it to the wrong location as well.

  7. Log in to comment