Alternative approach to linked boundary conditions?

Issue #181 new
David Dickinson created an issue

Currently for our linked boundary conditions we need to send our local processor’s boundary values to all connected cells to the left/right. The code to calculate this communication pattern (in init_connected_bc) currently scales poorly with problem size (quadratic in nx) and is effectively serial in nature as it loops over the entire domain. We only do this calculation once per simulation.

The resulting communication pattern is then employed in every invert_rhs call and is implemented using point-to-point communications to send the minimal amount of information to just the processors which need it. Whilst this minimises the data transferred, it complicates the code and means we don’t take full advantage of tuned MPI routines. Could we instead simply send the boundary information to all processors which may need it using an mpi_allgather or mpi_allreduce? Sending more data may not be any more expensive if we’re latency bound.

Comments (1)

  1. David Dickinson reporter

    We essentially want to send values at the two pi theta boundaries for each iglo to all the processors with the same {ky,l,e,s} but different kx. If we sent this to all processors then each proc will need to hold an array which is {nkx,2,nky*nlambda*negrid*nspec} which is just a factor ntheta smaller than the entire global distribution function and doesn't scale with nproc. This is probably too big so we do need to be smarter! If we could guarantee that each processor needed to communicate with the same set of processors for all the {ky,l,e,s} which it owns then one could probably replace the separate point-to-point messages with a single collective on a sub-communicator.

    It may still be possible to replace the point-to-point calls with a single mpi_alltoallv call, but I expect that this is unlikely to really gain much.

  2. Log in to comment