dynamic loading of gl.dxe by ref_gl.dxe has issues.

Issue #14 closed
sezero created an issue

dynamic loading of gl.dxe by ref_gl.dxe has issues.

It is currently disabled in qgl_dos.c (see at the top of the source) and ref_gl.dxe still depends on gl.dxe at compile time.

The issue, as described by Frank, is that a crash happens at program exit if a vid_restart is ever triggered during execution.

That the issue does not happen with ref_gl.dxe depending on gl.dxe might be because of a dlclose() problem of current djgpp:

http://www.delorie.com/djgpp/mail-archives/browse.cgi?p=djgpp/2015/09/25/13:21:47

http://www.delorie.com/djgpp/mail-archives/browse.cgi?p=djgpp/2015/09/26/17:28:33

If that is he reason for our crash, then we are using either a stale or a NULL function pointer somewhere after unloading gl.dxe.

This is worth investigating and solving.

Comments (17)

  1. sezero reporter

    Frank: What happens if you remove those two lines 35 and 36 from qgl_dos.c (i.e. those two "until we fix things" lines), BUT keep the "-P gl.dxe" flag in ref_gl makefile: i.e. we dynamically load gl.dxe all the way, BUT our ref_gl.dxe is actually still depending on gl.dxe?

    If my suspicion is correct, it will work without crashes.

  2. sezero reporter

    OK, pushed 1bbc83a to remove those lines, but ref_gl.dxe is still built with dependency to gl.dxe on.

    My suspicion is as I explained above: Why are we not crashing if ref_gl.dxe depends on gl.dxe? The answer is that djgpp dlclose() has a bug. If you dlopen() a file it opens and its dependency dxes built into it are silently dlopen()ed. Now, if you dlclose() your dxe, your dxe is closed, but the implicitly opened dependency dxes are not closed and continue residing in memory as valid dxes. Our fer_gl.dxe is one such dxe: it depends on gl.dxe, meaning that when we dlopen() ref_gl.dxe, gl.dxe is implicitly dlopen()ed as well but never closed; and if we manually dlopen() gl.dxe it will only increment its refcount. In short, the function pointers dlsym()ed from gl.dxe will stay valid even after we dlclose() ref_gl.dxe or gl.dxe.

    However: if we build ref_gl.dxe without making it depend to on gl.dxe: dlopen() of ref_gl will NOT auto-dlopen gl.dxe. We open gl.dxe by ourself and dlsym() functions from it. If we dlclose() gl.dxe, it will REALLY unload, and any function pointers dlsym()ed from it will become invalid.

    All in all: We are possibly using such a 'stale' function pointer. Or less likely, we are using a null function pointer.

  3. sezero reporter

    Now that I pushed 1bbc83a, can you:

    1. rebuild q2.exe and ref_gl.dxe, and see that thet still behave?

    2. now remove the "-P.gl.dxe" arguments from ref_gl makefile and rebuild it, see whether it crashes? (If it does crash, it means that possibly our bug has been there all the time but was masked by djgpp dlclose() bug.)

  4. sezero reporter

    OK then.

    This will be hard to nail, but should be doable. We can start by adding prints to certain key points (like "QGL_Shutdown begin/end") and run the exe with "+set logfile 2 +set logfile_active 2"

  5. Frank Sapone

    We'll have to do it tomorrow perhaps because I already spent too much time already on the _fp conversion and running these tests. Maybe we should make a separate branch and you can put all the printfs you think are necessary there and make logfile, logfile_active both 2 by default.

  6. sezero reporter

    I think I found it. Can you:

    1. Remove the "-P.gl.dxe" arguments from ref_gl makefile, and rebuild it; and
    2. Edit sage/util/cfg.c and comment out line 123, i.e. the atexit(cfg_kill) statement, and rebuild sage anew.
    3. Use the newly built ref_gl.dxe and sage gl.dxe and see if it behaves.
  7. Frank Sapone

    Here is what happens now: If SSE is disabled in the sage.ini mode switches work again without a segfault at exit (tested changing video modes multiple times in game, then connecting to a xatrix mod server and doing the same), but with SSE and no -P gl.dxe anything that triggers a vid_restart will immediately bomb out. The bombing out with SSE immediately on any mode switch stuff happens whenever -P gl.dxe is omitted.

  8. Frank Sapone

    If atexit in general is a specific problem then atexit is also here:

    C:\proj\qdos\3rdparty\glide3x.src\sst1\init\init96\lindrvr.c(352):  atexit(linExit);
    C:\proj\qdos\3rdparty\glide3x.src\swlibs\fxmisc\linutil.c(51):  atexit(reset_term);
    C:\proj\qdos\3rdparty\Mesa-6.4.2.src\src\mesa\drivers\glide\fxapi.c(127):       atexit(cleangraphics);
    C:\proj\qdos\3rdparty\Mesa-6.4.2.src\src\mesa\drivers\glide\fxapi.c(823):    * so we can debug atexit problems (memory leaks, etc).
    C:\proj\qdos\3rdparty\sage\drivers\foo\glx.c(78):   atexit(sage_fini);
    C:\proj\qdos\3rdparty\sage\drivers\glide\glx.c(78): atexit(sage_fini);
    C:\proj\qdos\3rdparty\sage\util\cfg.c(6): * Destruction is registered with atexit().
    C:\proj\qdos\3rdparty\sage\util\cfg.c(123):    // atexit(cfg_kill);
    
  9. sezero reporter

    You had said that enabling SSE is already causing trouble, i.e. issue #12, and that's why we disabled SSE in the ini. Am I missing something?

    Did commenting out the atexit() call in cfg.c help with anything?

  10. Frank Sapone

    Yes. I enable SSE in my inis because if i pass junk params at startup it will work just fine. In any case, with SSE DISABLED in the ini your commenting out of atexit fixes the segfaults at exit. But, with SSE ENABLED any kind of mode switching will segfault. I noticed the mode switch and sage +sse issue when you started the GL_DLSYM stuff. For some reason when -P gl.dxe is OMITTTED from ref_gl.dxe then SSE enabled from INI sage completely blows up during any kind of mode switch, never matters what situation. I just find that extra odd.

  11. sezero reporter

    If atexit in general is a specific problem then atexit is also here:

    Those are in linux codepaths (except for that atexit(cfg_kill) of sage which does it unconditionally.)

  12. Frank Sapone

    You're right on the other atexits, didn't look at their specific context, just did a grep for them.

  13. sezero reporter

    As of commit ac6a702 this issue is fixed. We can start building ref_gl.dxe without making it to depend on gl.dxe now, if we want.

    As for Sage video mode change misbehaving if SSE is enabled in the ini and ref_gl.dxe doesn't depend on gl.dxe: Well, if we know that SSE is broken, then we know that it will pop up from somewhere, and it did that in mode switch.

    Please add that information to issue #12.

  14. Log in to comment