I have a frontend where the max event size from a remote client is ~2MB. I just updated from a 2019 version of midas to the latest, and had lots of issues with the SYSTEM buffer, causing any client connected to that buffer to crash.
I tracked it down to mserver getting stuck in an infinite loop in
bm_flush_cache_locked. The default write_cache_size for MFE frontends is 100000 bytes, and I happened to send an event of 69000 bytes. For some reason
bm_flush_cache_locked asks for a maximum of
2*pbuf->write_cache_size/3 bytes. So my 69k event would never fit in the 66k of space it asked for, and the mserver instance handling my remote client just got stuck forever. And it got stuck while holding the semaphore, so all my other clients of SYSTEM got killed due to timeouts.
I managed to “fix” this by setting a much bigger value in
Common/Write cache size for my frontend (3MB instead of 100kB). I’ll note that this parameter did not exist when I first wrote the frontend.
- Is there any situation where the write cache size being smaller than the max event size is a good thing? Should MFE automatically set it so that the condition is fulfilled?
- Is the logic in
bm_flush_cache_lockedreally correct? It seems that any event that is >= 2/3 of the write cache size is going to trigger this.
- Why did I not see this issue in the older code (before 3619ecc6ba1d29d74c16aa6571e40920018184c0) when the cache size was hard-coded to 100000 bytes!?