Use of RPC_FTCP is not thread-safe

Issue #171 resolved
Ben Smith created an issue

There are several places in mfe.c and midas.c that set the FAST TCP RPC option: rpc_set_option(-1, RPC_OTRANSPORT, RPC_FTCP). Unfortunately this is done without having the RPC mutex. This means that another thread can call something like db_get_key() while the option is still RPC_FTCP. This causes rpc_call() to return early, and not bother waiting for a response. However rpc_call() still returns SUCCESS, and so db_get_key() will also return SUCCESS, even though the KEY structure hasn't been populated! Similar issues will affect many other functions.

I assume this behaviour (all multi-threaded access to the ODB is unsafe on remote connections) is unintentional. I see the issue regularly in one of our unlucky frontends for SuperCDMS.

I think the solution is that everywhere that sets the RPC_FTCP option on the server connection needs to grab the RPC mutex first.

E.g. change

         rpc_set_option(-1, RPC_OTRANSPORT, RPC_FTCP);
         db_send_changed_records();
         rpc_set_option(-1, RPC_OTRANSPORT, RPC_TCP);

to

         get_the_rpc_mutex();
         rpc_set_option(-1, RPC_OTRANSPORT, RPC_FTCP);
         db_send_changed_records();
         rpc_set_option(-1, RPC_OTRANSPORT, RPC_TCP);
         release_the_rpc_mutex();

Comments (7)

  1. dd1

    Yikes, what a bug. I never understood this RPC_FTCP business, now I guess I will have to look into it... (Having global settings like this is definitely bad for multithread uses... I much prefer to specify special behaviour per-call exlicitely). K.O.

  2. dd1

    Ok. It looks simple.

    RPC_FTCP tells rpc_call() and rpc_client_call() to skip receiving the RPC reply. It also sets the TCP_FAST bit in the RPC routine_id which tells the RPC handler to skip sending the reply.

    The only place where this is used in midas.c is in cm_shutdown() to shutdown a remote client via rpc_client_disconnect() and rpc_client_call(RPC_ID_SHUTDOWN or RPC_ID_EXIT).

    The only place it is used elsewhere is mfe.c to speedup db_send_changed_records() by not waiting for the RPC response (one way command).

    But this is redundant, inside db_send_changed_records() before making the db_set_record() call, the RPC_FTCP mode is set again.

    So I think a) the extra calls to set RPC_FTCP can be removed from mfe.c. b) db_send_changed_records() can be changed to read: if (rpc_is_remote()) { rpc_call(RPC_DB_SET_RECORD|TCP_FAST,...) } else { db_set_record(...) }

    Then as a bonus: - change rpc_client_disconnect() to read rpc_client_call(hConn, bShutdown ? (RPC_ID_SHUTDOWN|TCP_FAST) : (RPC_ID_EXIT|TCP_FAST)); - RPC_FTCP becomes unused, remove it and the corresponding rpc_set_option() - rename TCP_FAST to RPC_NO_REPLY

    Let's see if Stefan is okey with this.

    K.O.

  3. Ben Smith reporter

    Sorry for the delay - BitBucket emails were ending up in junk so I hadn't realised you'd worked on this.

    I've tried the new branch and it works great. The problem is fixed, and I don't see any new issues. Thanks!

  4. Stefan Ritt

    I reviewed KO's modifications and fully approve them. It's a clever idea to put the RPC_NO_REPLY as a flag directly into the routine_id. The branch should be merged.

  5. Log in to comment