ideal_ball - segmentation fault - index out of bounds
I am getting a segmentation fault when I try to run ideal_ball.
I built ideal_ball from gs2 8.0.2 with make -j USE_NEW_DIAG= ideal_ball
. I run this on the attached input file + equilibrium with ideal_ball input.in
. I get the following error messages:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7FBC29B5FE08
#1 0x7FBC29B5EF90
#2 0x7FBC294A64AF
#3 0x42812B in __geometry_MOD_tdef at geometry.f90:?
#4 0x43B54B in __geometry_MOD_eikcoefs
#5 0x40514A in __ballstab_MOD_run_stability_check
#6 0x4647D4 in MAIN__ at ideal_ball.f90:?
Segmentation fault (core dumped)
or with a debug version (make -j USE_NEW_DIAG= DEBUG=y ideal_ball
) it says this:
At line 2898 of file /home/steve/Work/gs2/gs2-dev/src/geo/geometry.f90
Fortran runtime error: Index '1' of dimension 1 of array 'th_bish' above upper bound of 0
From running with gdb
and comparing this to running with export GK_VERBOSITY=3
it appears that it is calling eikcoefs
twice. First from src/theta_grid.f90:1177
with the correct nth
(90) and th_bish
(size = 90 I guess), then again later from src/geo/ballstab.f90:379
with incorrect nth
(0) and th_bish
(size = 0).
This is where I’ve got to so far. I need to take a break from this for now. I will have another look later. If someone else gets a chance to look in the meantime then please post any insight here. Don’t worry if not - I can just work it out myself later. Cheers
Comments (10)
-
-
Other geometry types have a similar setup, but some may set the
nthg
output before returning, for example inteqin
ofpeq.fpp
if(initeq == 0) then nthg = nt/2 return endif
-
reporter The above suggested fixes “work” in so far as they stop the program from crashing. However, the root cause of the problem and the reason why the suggested fixes work was not clear. Indeed, it was not clear whether the “fixed” program was actually performing the calculation correctly. Therefore, I have looked into this a bit deeper. I have found some useful information but I am not sure what the correct fix is. Therefore, I am posting what I know here so that we can work out the correct fix together.
This is what I’ve found so far:
- The broken code (say
8.0.4_RC
) fails on the second run throughgeometry::eikcoefs
, as David suspected. - This is because the second call to
ceq::ceqin
returns early andnthg
is not set. - Therefore, back in geometry,
nthg
is uninitialised and so effectively has the value of any random integer. Chances are this will be higher than the array bounds ofntgrid
, hence the out-of-bounds seg fault. - This can be fixed by setting
nthg
before the early exit inceqin
. If this is done correctly, then I think it becomes unnecessary to callgeometry::finish_geometry
. - In
peq::teqin
, it hasnthg = nt/2
before the early exit. So what happens ifceqin
usesnt/2
? - If one uses
nt/2
inceqin
, then the geometry module arrays (those allocated inalloc_module_arrays
) are the correct size from the first run but the geometry local arrays (allocated inalloc_local_arrays
) end up too small by a factor of 2 and the program crashes when we encounter operations that involve arrays of different sizes, i.e.gradztot(:,1) = crpgrad(:,2)/grho
(wheregrho
is the correct size but the other two are too small). - This is because
geometry::tdef
has the linenth=nthg/2 ! correct, at least for geq
so the divide by 2 has been applied twice. - Therefore, it appears that the correct thing to do in
ceqin
isnthg = nt
. This is the same as what appears ingeq::eqin
. Indeed, this works in so far as this little fix alone allows the program to run without crashing but usingnthg = nt/2
results in a crash, as does not settingnthg
at all. - This makes me wonder if the
nthg = nt/2
inteqin
is incorrect. Indeed, printing the array sizes viaexport GK_VERBOSITY=3
suggests that the theta grid is a factor of 2 too small but this is true on the first run so all arrays are too small, hence no mismatch, hence no crash. So I thinkteqin
needs to be fixed to haventhg = nt
, rather thannt/2
. Can someone else give a second opinion on this please. - This also makes me wonder if the other equilibrium types are incorrect as most have
if(initeq == 0) return
, with onlyteqin
havingnthg = nt/2
andgeq::eqin
havingnthg = nt
. Unfortunately, I do not have any equilibrium files of the other equilibrium types to test. Can someone provide such equilibrium files for testing or suggest another way to handle this please. - I do not yet know what the local equilibrium models (s-alpha and Miller) do here as they handle things differently. Can someone give some pointers on where to look / what to look for before I start looking myself please.
- The broken code (say
-
reporter @David Dickinson New comment above. Just tagging you so that you definitely get a notification. Cheers
-
Just off the top of my head, but I think only a few geometry types use
geometry::tdef
others expectinit_theta
to have been called. I think there’s a comment inpeq::teqin
asking why we halve the number of theta grid points.The number of theta points should be the same on the first and subsequent calls, if this isn’t the case then it’s wrong.
Really we should decouple the theta grid provided by the specific equilibrium grid and the one used by the rest of gs2 (i.e. we should be able to change the number of theta grid points) but that’s a bigger issue.
-
reporter OK, thanks. I’m going to fix the CHEASE bug only because that’s what I need for my work. I will open a separate issue for the other equilibrium types and someone else can deal with that at a later date.
-
reporter I’ve worked it out!
For numerical equilibria (well,
TRANSP
andCHEASE
at least), the theta grid size is dictated by that of the equilibrium data file;ntheta
in the input file is ignored. It would be better if we could setntheta
in the input file and down-sample / interpolate as required. But, as you say, that is a separate and larger new feature. I will create an issue for this as a feature request to be dealt with by someone… one day… maybe…With
TRANSP
equilibria, the theta grid size is set to the equilibrium theta grid size divided by two for some reason. Why divided by two I don’t know. But the early exit fromteqin
and the full routine are at least consistent with each other so there is no crash when runningideal_ball
. Although the factor of 2 is unexplained and somewhat strange, I think this is OK for now in so far as bothgs2
andideal_ball
should be able to run with aTRANSP
equilibrium file correctly. Therefore, I will not create an issue about this for now.As for CHEASE, the current bug is because
nthg
is not set when there is an early exit fromceqin
. Therefore,gs2
runs OK (only callsceqin
once and runs the full routine) butideal_ball
crashes (on the second run ofeikcoefs
, theceqin
subroutine exits early sonthg
is uninitialised). The correct fix is to setnthg
before the early exit fromceqin
. And the correct value is whatever is consistent with a full run, which isnthg = nt
, i.e. not divided by two. Therefore, I will make this fix and submit it via a new PR.Finally, the other numerical equilibrium types mostly leave
nthg
uninitialised when they exit early from theireqin
subroutine (all apart fromgeq
). Therefore, they are probably OK when run withgs2
and will probably break in the same way asCHEASE
when run withideal_ball
. However, testing and fixing those is beyond the scope of this bugfix. AFAIK, the use case of numerical equilibria withideal_ball
is rare anyway so not fixing all types right now is probably OK. Therefore, for this bugfix, I will just add comments to the other numerical equilibrium types to identify the potential for problems if run withideal_ball
. -
reporter - changed status to resolved
-
reporter - changed status to open
Re-open as not resolved until merged
-
- changed status to resolved
Fixed in PR#271
- Log in to comment
The value
nthg
is supposed to be returned by the call toceqin
(in your case) but the first line of this routine isif(initeq==0) return
without settingnthg
. I think you’re probably running into this case on the second call toeikcoefs
. This variable (actually stored ineqinit
) can be reset by callingfinish_geometry
or as it is public in geometry it could be used and manually set to1
. The former is probably a nicer, albeit more expensive, route.
I’ve not tested if this works.