Bugfix/dont leave early if no elements in save read restart files

Merged
#515 · Created  · Last updated

Merged pull request

Merged in bugfix/dont_leave_early_if_no_elements_in_save_read_restart_files (pull request #515)

f7e5440·Author: ·Closed by: ·2021-08-05

Description

  • Don't return from save/read restart if n_elements is zero

    This could avoid potential hangs when built with parallel netcdf which call barrier after this early return

  • Avoid n_elements becoming negative

  • Remove n_elements > 0 guards

Previously we would skip trying to write/read restart files on processors with none of the g_lo domain allocated. This PR instead allows such processors to write their own restart files.

This fixes two problems:

  1. On restart/timestep change processors obtain the value of potentials from their own restart file. For processors with no work skipping the write/read means the potentials become zero.

  2. With parallel netcdf we have barriers after the previous check for an early return. If we ran with parallel netcdf and had a processor with no work then this would likely lead to a hanging simulation.

Whilst the first of these is relatively minor and unlikely to have a real world impact the second fix makes parallel netcdf builds more robust.

In fact, the first could have significant real world impact. Consider the situation where one processor has no work in g_lo but does have work to do in le_lo which involves the potentials, following a restart it is possible that this processor will have incorrect potentials assuming that we do not broadcast these as a part of the usual restarting process.

0 attachments

0 comments

Loading commits...