unsafe gethostbyname() in rpc_client_connect()

Issue #357 resolved
Former user created an issue

see an mhttpd crash in multithreaded use of gethostbyname() in rpc_client_connect(). K.O.

Comments (3)

  1. Lukas Berns

    Thank you for this fix. Let me add some details on the different symptom we had, so someone on latest tagged version like 2022-05-c can find this issue:

    In our case there was no crash of mhttpd. Instead, for the first run-start after many (>10) frontends were restarted, we would occasionally see a “connection refused” error from cm_transition calling rpc_client_connect, which then leads to the frontend getting terminated through RPC (the same one that failed a brief moment ago). So in the midas log we would see the “connection refused” and immediately afterwards a log message from the same frontend about being terminated, and in the frontend’s log it would just show everything fine (no run start) and then just the message about the termination.

    What was happening was that mhttpd would attempt to establish the RPC connection, but because of the multi-threading with MT-unsafe gethostbyname function, it would sometimes attempt a connection to an IP address corresponding to a hostname from another thread, and thus end up with the wrong hostname-port pair, resulting in "connection refused" errors with subsequent termination of the frontend (also through RPC), which this time succeeds because this time only a single thread would attempt the second RPC connection to terminate the frontend.

    I was able to reliably reproduce the issue with a 10 ms sleep inserted after the gethostbyname call, which is indeed fixed in the more recent versions (checked with 506cce6) thanks to the move to MT-safe functions.

  2. Log in to comment