- changed milestone to 8.1
- changed component to Other
-
assigned issue to
Mismatch between upper limits on processors with no work
Below is copied from comment on PR #513
We find that ulim_proc/=ulim_alloc
when llim_proc>ulim_world
(i.e. when there are processors with no work). This can arise when nproc(nproc-1) > ulim_world = naky*ntheta0*nspecies*nlambda*negrid
. In this situation ulim_alloc = llim_proc
whilst ulim_proc = ulim_world
.
We use ulim_alloc
in allocations like allocate(gnew(-ntgrid:ntgrid, 2, g_lo%llim_proc:g_lo%ulim_alloc))
, which in these situations will result in arrays with a trailing dimension of size 1.
In loops we always use ulim_proc
instead of ulim_alloc
but it can be the case that ulim_proc < llim_proc
such that loops which look like do iglo = g_lo%llim_proc, g_lo%ulim_proc
will have zero iterations.
As we tend to allocate these sorts of arrays and the set them using a loop we can end up with uninitialised arrays, for example consider the following
allocate(gnew(-ntgrid:ntgrid, 2, g_lo%llim_proc:g_lo%ulim_alloc))
do iglo = g_lo%llim_proc, g_lo%ulim_proc
gnew(:, :, iglo) = 1.0
end do
print*,maxval(abs(gnew)),minval(abs(gnew))
On processors with ulim_alloc = ulim_proc
we will initialise the full array and hence see 1, 1
printed. For processors with ulim_alloc = llim_proc > ulim_proc
the result will be undefined as we allocate gnew
with finite size but our loop has zero iterations so we never set the elements of the array.
If we only use such arrays within such loops this would be ok (although not ideal) as we’d never touch the uninitialised data either, however if we try to do an array operation (e.g. g = gnew + 1.0
) we will then use this uninitialised data.
As we are trying to indicate that there is no work to do it probably makes sense to set ulim_proc = ulim_alloc = llim_proc - 1
in this situation. The arrays then have zero size (so are allocated and exist but hold no data) and our loops have zero extent. This can still lead to problems, asking for the maxval of a zero length array returns -huge(real)
– whilst this is part of the Fortran standard our code may need to be careful about how we interpret this, for example.
Comments (1)
-
reporter - Log in to comment