Allow separate nproc for different layouts

Issue #193 resolved
David Dickinson created an issue

PR #623 added the ability to pass the number of processors over which to decompose a particular data layout. As noted in the PR this can allow the use of sweetspots for each layout, potentially resulting in significant savings in communication costs.

Currently it is not possible for the user to specify the number of processors to use in a given layout. We should

  1. Allow the user to specify the number of processors to use for each layout (capped to the global nproc)
  2. Consider recommending sensible values and allowing the user to indicate they would like the code to pick
  3. Consider how one may automatically optimise this.

Reducing nproc from the global maximum may be motivated by trading distributing the computational work for reducing communication costs. Communication costs arising from the decomposition primarily come in through the redistributes. PR #627 allows one to generate redistributes using different layout instances. This is a first step towards exploring options 2 and 3 above. It may be possible to generate a support code which explores the properties of redistributes for different nproc choices and gives the user more information in order to choose the best number of processors for each layout.

For example, it may be useful to know

  1. The maximum number of messages sent by any processor
  2. The minimum and maximum message sizes
  3. The fraction of local data sent during the redistribute

It may be helpful to move the code in ingen used to calculate sweetspots to a different location (layouts?) to make this information more accessible.

Comments (1)

  1. Log in to comment