pymod is very unstable because of race conditions

Issue #130 resolved
created an issue

Devastation crashes about once or twice a day with something related to pymod. The backtraces usually show a condition that is not possible, were it not for multiple threads. One example inside libpython looks something like this:

Something *bla = NULL;
void doALotofStuff()
         bla = getBla(); // never NULL
         // ... Alot of stuff that never sets bla ...
         // bla is sometimes NULL here because of a race condition: seg fault!
         bla = bla->next; // might be NULL

So I figured out this is because someone did not RTFM :).

When threads are created using the dedicated Python APIs (such as the threading module), a thread state is automatically associated to them and the code showed above is therefore correct. However, when threads are created from C (for example by a third-party library with its own thread management), they don’t hold the GIL, nor is there a thread state structure for them.

If you need to call Python code from these threads (often this will be part of a callback API provided by the aforementioned third-party library), you must first register these threads with the interpreter by creating a thread state data structure, then acquiring the GIL, and finally storing their thread state pointer, before you can start using the Python/C API. When you are done, you should reset the thread state pointer, release the GIL, and finally free the thread state data structure.

So what is happening is, ASSS has multiple threads. These thread trigger callbacks. These callbacks make it to pymod. And pymod does not use the python global interpreter lock, it acts as if ASSS only has one thread.

Anyway I have patched deva and I will test it as soon as I can restart it (I have only been able to test it locally so far). I will create a pull request afterwards.

The locking is unfortunately a bit complex because you need a different set of lib python functions for the main thread and the other threads spawned by C (net, worker, etc). And because some callbacks might be spawned by any thread, main included, I have to use pthread_self() to find out which set of functions I need. The locks also need to be recursive.

Comments (2)

  1. Log in to comment