Mismatch between upper limits on processors with no work

Issue #156 new
David Dickinson created an issue

Below is copied from comment on PR #513

We find that ulim_proc/=ulim_alloc when llim_proc>ulim_world (i.e. when there are processors with no work). This can arise when nproc(nproc-1) > ulim_world = naky*ntheta0*nspecies*nlambda*negrid. In this situation ulim_alloc = llim_proc whilst ulim_proc = ulim_world.

We use ulim_alloc in allocations like allocate(gnew(-ntgrid:ntgrid, 2, g_lo%llim_proc:g_lo%ulim_alloc)), which in these situations will result in arrays with a trailing dimension of size 1.

In loops we always use ulim_proc instead of ulim_alloc but it can be the case that ulim_proc < llim_proc such that loops which look like do iglo = g_lo%llim_proc, g_lo%ulim_proc will have zero iterations.

As we tend to allocate these sorts of arrays and the set them using a loop we can end up with uninitialised arrays, for example consider the following

allocate(gnew(-ntgrid:ntgrid, 2, g_lo%llim_proc:g_lo%ulim_alloc))
do iglo = g_lo%llim_proc, g_lo%ulim_proc
    gnew(:, :, iglo) = 1.0
end do
print*,maxval(abs(gnew)),minval(abs(gnew))

On processors with ulim_alloc = ulim_proc we will initialise the full array and hence see 1, 1 printed. For processors with ulim_alloc = llim_proc > ulim_proc the result will be undefined as we allocate gnew with finite size but our loop has zero iterations so we never set the elements of the array.

If we only use such arrays within such loops this would be ok (although not ideal) as we’d never touch the uninitialised data either, however if we try to do an array operation (e.g. g = gnew + 1.0) we will then use this uninitialised data.

As we are trying to indicate that there is no work to do it probably makes sense to set ulim_proc = ulim_alloc = llim_proc - 1 in this situation. The arrays then have zero size (so are allocated and exist but hold no data) and our loops have zero extent. This can still lead to problems, asking for the maxval of a zero length array returns -huge(real) – whilst this is part of the Fortran standard our code may need to be careful about how we interpret this, for example.

Comments (1)

  1. Log in to comment