tmfe-based programs do not die from watchdog timeout

Issue #282 resolved
dd1 created an issue

tmfe-based programs keep running after removal by watchdog timeout. I think this is because cm_periodic_tasks() does not kill the program if it becomes disconnected from ODB (by timeout). But why do mfe.c frontends die correctly? Probably because they always update the statistics, this touches ODB, discovers we are disconnected and kills itself. Need to understand this. K.O.

Comments (4)

  1. Stefan Ritt

    Sometimes a cm_shutdown is called, which in turn does a

    rcp_client_disconnect(hConn, TRUE);
    

    The TRUE causes a shutdown of the RPC client by sending a RPC_ID_SHUTDOWN message. This message is returned on the client’s cm_yield() as RPC_SHUTDOWN. The mfe.cxx frontend then does a

    do {
       status = cm_yield();
    } while (status != RPC_SHUTDOWN);
    

    in its main loop.

  2. dd1 reporter

    the best I can tell, these programs are removed from odb (by timeout?) but ODB is missing a check to make them die from this. So we have all these programs running almost normally, they are connected to event buffers (which somehow did not timeout), they write to ODB, but they are not listed in ODB clients (removed from pheader->client[]) and they are not listed in /System/Clients.

    I now added a check in ODB to make programs die if they are removed from ODB pheader->client[] array (check against using an invalid client_index).

    The event buffer code already has a similar check. There, if buffer client uses wrong client_index, they start changing buffer pointers of wrong client and corrupt the buffer. So if their client_index is wrong, they must die.

    This check also happens to kill half-conected mfe-based frontends, but somehow not half-connected tmfe-based frontends. (half-connected = connected to event buffer but not to odb or the other way round).

    With the similar check against the ODB client_index, I should no longer see these half-connected tmfe-based frontends.

    commit 3bcfa12e

    K.O.

  3. Log in to comment