Issue #118 new

threading/locking

numpf
created an issue

asss does not follow any formal threading model I know. Each module must behave as though it is a library that could be called from many threads, and is responsible for thread safety. This seems unnecessary, especially for "non-core" modules, perhaps defined as those that do not provide a library-like service (logman or cfgman) or do not handle packets.

Locks are poorly documented. docs/asss-threading.txt is out of date (last updated in r77 in 2001), though vaguely correct. There are locks for the lists of arenas (aman->Lock) and players (pd->Lock). There appear to be some violations of locking order in the existing code; e.g. aman->GetPopulationSummary demands in its comment that it be called with aman->Lock held, but acquires pd->Lock in its body. Deadlocks have been a persistent problem for asss.

Inverted locking orders (and thus deadlocks) are easy to create when

module A creates a mutex to protect its own data and holds it during a call to an interface function F in module B

module B has its own mutex which it locks in F

module B also calls a callback CB with its mutex held

module A locks its mutex in its function for CB

This happens with B = cfgman, F = GetStr, CB = AA_CONFCHANGED

It's unclear to me what the underlying principle should be to avoid this, if there is one. grel said somewhere that he felt asss was replicating COM, so COM docs could give hints.

In the short term, every mutex should be documented, particularly if its held for a callback or locked during an interface function. I suspect a good short (medium?) term goal would be to remove explicit locks from interfaces (aman->Lock, pd->Lock), and fix whatever is implied by their removal.

Longer term, asss threading should probably be redesigned. Popular opinion (dr brain and HS devs, others too I think) seems to be to just make it almost entirely single-threaded. No zone is, or probably ever will be, on a scale to really require threading other than for things like disk I/O

Comments (5)

  1. drbrain

    Yes, after working with ASSS over the past several years, it has become my opinion that everything in the core should be dispatched from a single thread.

    This doesn't mean that the program will run a single thread, as modules would be able to have their own threads for processing asynchronous events (such as net and mysql), but they wouldn't be allowed to call other modules from those threads directly. Any callbacks would need to be scheduled in the "main" thread using something akin to a mainloop timer. These sorts of things are easy in languages like Python, but can be quite cumbersome in C. I'm not sure if anything can be done to simplify them.

    This would eliminate the need for any locking within the core.

    There are of course other threading models that could be used, but their complexity is really not worth the extra parallelism that they grant. In the last 10 years, CPU processing ability has dramatically increased (on the order of 100 times), and I believe that any modern computer could easily run any conceivable population without issue (bandwidth is a far more practical concern).

  2. numpf reporter

    there should be some explicit message sent to threads so they can end during a shutdown. Not doing so makes valgrind report leaks, and probably causes the unloading module errors on shutdown

  3. drbrain

    Sounds like a lot of complication for very little gain. Might make more sense to not use daemon threads at all, and sidestep the issue entirely.

  4. jowie

    This has issue has improved considerably in 1.6.0. There is a new mainloop method "RunInMain" which schedules a method to run on the main loop in less then a millisecond. Almost all of the callbacks come from the main thread now (packet handlers, CB_FOO, mysql callbacks, etc).

  5. Log in to comment