ideal_ball - segmentation fault - index out of bounds

Issue #90 resolved
Stephen Biggs-Fox created an issue

I am getting a segmentation fault when I try to run ideal_ball.

I built ideal_ball from gs2 8.0.2 with make -j USE_NEW_DIAG= ideal_ball. I run this on the attached input file + equilibrium with ideal_ball input.in. I get the following error messages:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7FBC29B5FE08
#1  0x7FBC29B5EF90
#2  0x7FBC294A64AF
#3  0x42812B in __geometry_MOD_tdef at geometry.f90:?
#4  0x43B54B in __geometry_MOD_eikcoefs
#5  0x40514A in __ballstab_MOD_run_stability_check
#6  0x4647D4 in MAIN__ at ideal_ball.f90:?
Segmentation fault (core dumped)

or with a debug version (make -j USE_NEW_DIAG= DEBUG=y ideal_ball) it says this:

At line 2898 of file /home/steve/Work/gs2/gs2-dev/src/geo/geometry.f90
Fortran runtime error: Index '1' of dimension 1 of array 'th_bish' above upper bound of 0

From running with gdb and comparing this to running with export GK_VERBOSITY=3 it appears that it is calling eikcoefs twice. First from src/theta_grid.f90:1177 with the correct nth (90) and th_bish (size = 90 I guess), then again later from src/geo/ballstab.f90:379 with incorrect nth (0) and th_bish (size = 0).

This is where I’ve got to so far. I need to take a break from this for now. I will have another look later. If someone else gets a chance to look in the meantime then please post any insight here. Don’t worry if not - I can just work it out myself later. Cheers 🙂

Comments (10)

  1. David Dickinson

    The value nthg is supposed to be returned by the call to ceqin (in your case) but the first line of this routine is if(initeq==0) return without setting nthg. I think you’re probably running into this case on the second call to eikcoefs. This variable (actually stored in eqinit) can be reset by calling finish_geometry or as it is public in geometry it could be used and manually set to 1. The former is probably a nicer, albeit more expensive, route.

    I’ve not tested if this works.

  2. David Dickinson

    Other geometry types have a similar setup, but some may set the nthg output before returning, for example in teqin of peq.fpp

    if(initeq == 0) then
       nthg = nt/2
       return
    endif
    

  3. Stephen Biggs-Fox reporter

    The above suggested fixes “work” in so far as they stop the program from crashing. However, the root cause of the problem and the reason why the suggested fixes work was not clear. Indeed, it was not clear whether the “fixed” program was actually performing the calculation correctly. Therefore, I have looked into this a bit deeper. I have found some useful information but I am not sure what the correct fix is. Therefore, I am posting what I know here so that we can work out the correct fix together.

    This is what I’ve found so far:

    • The broken code (say 8.0.4_RC) fails on the second run through geometry::eikcoefs, as David suspected.
    • This is because the second call to ceq::ceqin returns early and nthg is not set.
    • Therefore, back in geometry, nthg is uninitialised and so effectively has the value of any random integer. Chances are this will be higher than the array bounds of ntgrid, hence the out-of-bounds seg fault.
    • This can be fixed by setting nthg before the early exit in ceqin. If this is done correctly, then I think it becomes unnecessary to call geometry::finish_geometry.
    • In peq::teqin, it has nthg = nt/2 before the early exit. So what happens if ceqin uses nt/2?
    • If one uses nt/2 in ceqin, then the geometry module arrays (those allocated in alloc_module_arrays) are the correct size from the first run but the geometry local arrays (allocated in alloc_local_arrays) end up too small by a factor of 2 and the program crashes when we encounter operations that involve arrays of different sizes, i.e. gradztot(:,1) = crpgrad(:,2)/grho (where grho is the correct size but the other two are too small).
    • This is because geometry::tdef has the line nth=nthg/2 ! correct, at least for geq so the divide by 2 has been applied twice.
    • Therefore, it appears that the correct thing to do in ceqin is nthg = nt. This is the same as what appears in geq::eqin. Indeed, this works in so far as this little fix alone allows the program to run without crashing but using nthg = nt/2 results in a crash, as does not setting nthg at all.
    • This makes me wonder if the nthg = nt/2 in teqin is incorrect. Indeed, printing the array sizes via export GK_VERBOSITY=3 suggests that the theta grid is a factor of 2 too small but this is true on the first run so all arrays are too small, hence no mismatch, hence no crash. So I think teqin needs to be fixed to have nthg = nt, rather than nt/2. Can someone else give a second opinion on this please.
    • This also makes me wonder if the other equilibrium types are incorrect as most have if(initeq == 0) return, with only teqin having nthg = nt/2 and geq::eqin having nthg = nt. Unfortunately, I do not have any equilibrium files of the other equilibrium types to test. Can someone provide such equilibrium files for testing or suggest another way to handle this please.
    • I do not yet know what the local equilibrium models (s-alpha and Miller) do here as they handle things differently. Can someone give some pointers on where to look / what to look for before I start looking myself please.

  4. Stephen Biggs-Fox reporter

    @David Dickinson New comment above. Just tagging you so that you definitely get a notification. Cheers

  5. David Dickinson

    Just off the top of my head, but I think only a few geometry types use geometry::tdef others expect init_theta to have been called. I think there’s a comment in peq::teqin asking why we halve the number of theta grid points.

    The number of theta points should be the same on the first and subsequent calls, if this isn’t the case then it’s wrong.

    Really we should decouple the theta grid provided by the specific equilibrium grid and the one used by the rest of gs2 (i.e. we should be able to change the number of theta grid points) but that’s a bigger issue.

  6. Stephen Biggs-Fox reporter

    OK, thanks. I’m going to fix the CHEASE bug only because that’s what I need for my work. I will open a separate issue for the other equilibrium types and someone else can deal with that at a later date.

  7. Stephen Biggs-Fox reporter

    I’ve worked it out!

    For numerical equilibria (well, TRANSP and CHEASE at least), the theta grid size is dictated by that of the equilibrium data file; ntheta in the input file is ignored. It would be better if we could set ntheta in the input file and down-sample / interpolate as required. But, as you say, that is a separate and larger new feature. I will create an issue for this as a feature request to be dealt with by someone… one day… maybe…

    With TRANSP equilibria, the theta grid size is set to the equilibrium theta grid size divided by two for some reason. Why divided by two I don’t know. But the early exit from teqin and the full routine are at least consistent with each other so there is no crash when running ideal_ball. Although the factor of 2 is unexplained and somewhat strange, I think this is OK for now in so far as both gs2 and ideal_ball should be able to run with a TRANSP equilibrium file correctly. Therefore, I will not create an issue about this for now.

    As for CHEASE, the current bug is because nthg is not set when there is an early exit from ceqin. Therefore, gs2 runs OK (only calls ceqin once and runs the full routine) but ideal_ball crashes (on the second run of eikcoefs, the ceqin subroutine exits early so nthg is uninitialised). The correct fix is to set nthg before the early exit from ceqin. And the correct value is whatever is consistent with a full run, which is nthg = nt, i.e. not divided by two. Therefore, I will make this fix and submit it via a new PR.

    Finally, the other numerical equilibrium types mostly leave nthg uninitialised when they exit early from their eqin subroutine (all apart from geq). Therefore, they are probably OK when run with gs2 and will probably break in the same way as CHEASE when run with ideal_ball. However, testing and fixing those is beyond the scope of this bugfix. AFAIK, the use case of numerical equilibria with ideal_ball is rare anyway so not fixing all types right now is probably OK. Therefore, for this bugfix, I will just add comments to the other numerical equilibrium types to identify the potential for problems if run with ideal_ball.

  8. Log in to comment