Large memory consumption from `wdriftttp` array
The (real valued, i.e. 64 bits per element) wdriftttp
array is allocated as allocate (wdriftttp(-ntgrid:ntgrid,ntheta0,naky,negrid,nspec,2))
so can become quite large for nonlinear simulations and the memory requirements do not scale with the number of processors (each process will allocate this full array).
The other related drift arrays, such as wdrift
, all allocated as distribution function shaped objects, i.e. allocate (wdrift(-ntgrid:ntgrid,2,g_lo%llim_proc:g_lo%ulim_alloc))
. This ensures the memory requirements are distributed, so reduce with increasing processor count.
I think we should be able to switch the wdriftttp
form to match the wdrift
form in order to have distributed memory usage. This will increase the memory usage if running on a single core by a factor nlambda
but we would anticipate jobs being run on a low number of cores having relatively low resolution so we wouldn’t expect hardware resource to be an issue in such cases.
There may be more efficient ways to store this information, considering it is only non-zero at theta grid points where there is a totally trapped particle.
Comments (5)
-
reporter -
reporter Looking at this code it seems there’s possibly one bug with this code already. The array
wdriftttp
doesn’t depend onlambda
but we could potentially assign to this for multiple values of the pitch angle depending on the properties ofittp
.Currently the code looks something like
do ig = -ntgrid, ntgrid if (ittp(ig) == nlambda+1) cycle do il = ittp(ig), nlambda if (forbid(ig, il)) exit !GWH+JAB: should this be calculated only at ittp? or for each totally trapped pitch angle? (Orig logic: there was only one totally trapped pitch angle; now multiple ttp are allowed. wdriftttp(ig,it,ik,ie,is,1) & = wdrift_func(ig,ittp(ig),ie,it,ik,is)*tpdriftknob !CMR: totally trapped particle drifts also scaled by tpdriftknob ! add Coriolis drift to magnetic drifts wdriftttp(ig,it,ik,ie,is,2) = wdriftttp(ig,it,ik,ie,is,1) & - wcoriolis_func(ig,il,ie,it,ik,is) wdriftttp(ig,it,ik,ie,is,1) = wdriftttp(ig,it,ik,ie,is,1) & + wcoriolis_func(ig,il,ie,it,ik,is) end do end do
The line
if (ittp(ig) == nlambda+1) cycle
skips settingwdriftttp
at any theta for which there are no totally trapped particles. This is fine.The lines
do il = ittp(ig), nlambda if (forbid(ig, il)) exit
show that we’re potentially iterating over several totally trapped pitch angles, starting at the one totally trapped at this theta and the exiting once we get to the first pitch angle that is forbidden at this location. Either we can only ever have exactly one pitch angle for which at the theta index
ig
,ittp(ig) < nlambda + 1
andforbid(ig, il) = .false.
or there is a bug here and we only get the Coriolis contribution from the highest non-forbidden pitch angle at this point. This is unlikely to be significant I think asmach
is often 0 in which case the Coriolis contribution would be 0 for all pitch angles.
-
reporter Second minor potential bug is that the Coriolis drift contribution is not weighted by
tpdriftknob.
-
reporter PR #269 should provide this change.
-
reporter - changed status to resolved
Resolved in 8.1 See Issue
#154for discussion around possible further savings. - Log in to comment
With this approach we would use more memory when
nlambda > nproc
.