Suggestion: add process name in description of Data access error

Issue #797 new
Sebastian Arcus created an issue

I think it would be really helpful if the description of "Data access error" event would include the process name as well, not just the PID.

Background: I have to use sometimes the "CHECK PROCESS ... MATCHING <regex>" format, as the monitored process gets restarted regularly after updates and Monit would keep on sending unnecessary warnings that the PID has changed, if I were using the "... PIDFILE <pidfile>" format. I was using the following config:

CHECK PROCESS spamd MATCHING "spamd"
    START PROGRAM = "/etc/rc.d/rc.spamd start"
    STOP PROGRAM = "/etc/rc.d/rc.spamd stop"
    IF NOT EXIST FOR 5 CYCLES THEN RESTART

CHECK PROGRAM spamd-update WITH PATH "/etc/monit.d/spamd-update"
    EVERY 60 CYCLES
    IF STATUS = 1 THEN ALERT

When using the above, I kept on receiving regular email warnings with the description:

Data access error Service spamd

Description: process with pid 7739 is a zombie

It took me quite a while to realise that Monit was occasionally matching the "spamd-update" process, not "spamd" - which was a zombie because of how Monit works (where scripts are run on one cycle, and Monit checks the output on the next cycle - but as a result they are left as zombies for one cycle). Renaming the "spamd-update" script to something else so that the regex above it matches spamd uniquely solved the problem in my case - but if the error description would have included the name of the process, not just the PID, I would have realised much sooner what I was doing wrong.

Comments (2)

  1. Log in to comment