Restarting nonlinear simulations on archer gives NaNs
I have run and restarted a nonlinear cyclone-base-case simulation on archer. Upon restarting, the first printed value of heat flux and potential matches the last step in the previous simulation. All subsequent values of potential are NaNs. I have tried a linear simulation with exactly the same input parameters and number of cores, and I do not get NaNs. I have attached the two input files that I used for the initial and restarted simulations.
Comments (8)
-
-
I see this locally using
next
, and these input files with(nx,ny)=(8,24)
, 4 procs. -
I've reproduced this on another system with 32 cores and
nx=ny=4
to speed things up a bit. This is using current next. -
Setting
nstep=0
incbc_restart.in
gives sensible values in the restart files, but settingnstep=1
givesnans
. -
reporter @David Dickinson the latest commit I see from
git log
is f7c09ab. I used 864 cores, although I realise now that I should have used 432 as I included only one species (though I doubt that is would cause the nans!). The module list is shown below: -
@Ollie Beeke great, thanks. Joseph and I have reproduced this independently. Could you tell me the output of
ncdump -v delt2 nc/restart.nc.0
? -
Could you try the branch provided in PR #201 (or apply the changes there to your case) this seems to have fixed the issue in my small reproducer. This boils down to a copy-paste error which meant we saved the oldest timestep in the wrong variable and restored it to the wrong location as well.
-
- changed status to resolved
Fixed in release 8.0.3
- Log in to comment
Thanks for the report, what version (or commit hash) are you using here and what modules on Archer are you using? How many processors are you running with?