Recently we tried to leverage Monit to monitor the memory usage of container on a Linux host. The configuration of this service is as following and polling cycle of Monit in our configuration is 60 seconds.
check program container_memory_<container_name> with path "/path/of/check_script <container_name> <memory_threshold_value_in_bytes>" if status == 3 for 10 times within 20 cycles then exec "/path/of/restart_script <container_name>"
Specifically Monit will invoke the
check_script to check whether the memory usage of container is larger than the threshold for 10 times within 20 cycles/minutes. If this condition is triggered, then
restart_script will be invoked to restart the corresponding container.
We found that the container can be restarted if the condition was triggered. However, after the culprit container was restarted, if memory usage of that container immediately increased from around 100MB to be larger than threshold value within 60 seconds, then that container will not be restarted anymore during 1 hour and memory usage of it continuously increased to be up 11GB.
After doing some debugging, I think this failure was due to Monit can’t reset its counter and one of potential reasons is that Monit can reset its counter if and only if the status of monitored service was changed from
Status failed to
Status ok. Can you kindly help me check whether my understanding is correct or not please?
I see we recently introduced another syntax format
repeat every <n> cycles for
exec . But I am wondering whether we need fundamentally fix the issue related to Monit can not reset its counter?
Overall this is awesome project and we used lots of wonderful features from Monit. Thank you!