Planning for fields_local optimisation merge -- distributed potentials

PR #361 introduces some optimisations for the fields_local field solve. This step can start to limit the scaling of the code at large scale so optimisations here offer the potential to improve our scaling at the current bottleneck.

A major feature of the optimised code is that after the field solve the full fields are not known globally, rather each processor will only possess a subset of the fields in the simulation - specifically, those that correspond to the theta/kx/ky points held by this processor in the g_lo layout (i.e. those that are needed for the rest of the local time advance). For example consider a simulation with naky>1 run with nproc such that each processor is responsible for evolving exactly one ky. With the current field solvers each processor will know the fields for all ky after each field solve, with the optimisation only the field for the ky owned by this processor should be known.

‌

The majority of the code does not need any modification to account for this as operations are only performed for the points owned in the local g_lo layout. The exceptions to this include

ExB shear – the current algorithm assumes we have access to the global fields on every processor to avoid communicating the fields. There has long been a workaround in this routine to broadcast the fields from proc0 to all processors. This probably needs improving to use a redistribute/fill call to communicate the fields as required.
Diagnostics – many diagnostics are written to work on proc0 only and may assume that the full potentials are available. One concrete example of this is the omega calculation which uses the full fields. A simple workaround is to communicate the fields when diagnostics are required. This negates the optimisation if done for omega as omega is currently calculated on every step. This could be avoided by only calculating omega every nwrite steps or my parallelising the calculation of omega.
It’s possible that the restart system makes some assumption about the fields being fully known on proc0. This is indicated by the restarting test failing when this optimisation is active, but the source has yet to be identified.

This issue is to provide a place to discuss further challenges identified, solutions to known limitations of this approach and other topics associated with bringing this optimisation into the main release.

‌

Comments (0)