race condition on creating SYSTEM buffer

Issue #64 resolved
dd1 created an issue

if multiple programs are trying to create the SYSTEM buffer, strange things will happen because creation of event buffer in bm_open_buffer() is not protected by any semaphore.

(same for creation of ODB shared memory, but usually not a problem because ODB shared memory is created once by the first program to start).

K.O.

Comments (4)

  1. dd1 reporter

    bm_open_buffer() does this:

    • calls ss_shm_open(“SYSTEM”)
    • if it return SS_CREATED, shared memory BUFFER_HEADER is zeroed, pheader->name and pheader->size are initialized. (read pointer, write pointer, etc are all set to zero, good)
    • if it returns SS_SUCCESS, we try to validate shared memory BUFFER_HEADER, etc.

    When two programs (not threads) call bm_open_buffer(SYSTEM) at the same time, they will race. the first one to call ss_shm_open(SYSTEM) will get SS_CREATED, the second one will get SS_SUCCESS and the two data paths above will race each other. In the simplest case, the second path will try to validate pheader->name and pheader->size before the first path initializes them and will fail to open the buffer. Ideally this should cause the caller to crash (frontend, event builder, mlogger), but this is not for sure.

    To avoid this race condition, I think we should create the semaphore first, lock it, then call ss_shm_open(), if we get SS_CREATED, initialize the shared memory while holding the semaphore. The second program will try to create the semaphore (will get SS_SUCCESS, semaphore already exists, no race condition), will try to lock it, (already locked by first program), will sleep until semaphore unlocked, by which time the first program has shared memory fully inialized, etc.

    K.O.

  2. dd1 reporter

    I am looking at the latest version of bm_open_buffer(). it does not look thread safe for 2 different threads trying to open the same buffer at the same time and it does not look thread safe for races between bm_open_buffer() and bm_send_event() (bm_open_buffer( does not lock the write cache mutex). ditto for the read cache.

    K.O.

  3. Log in to comment