Monit “check program” and restart based on exit code

Issue #321 closed
Juan Manuel Torres created an issue

When I use check process, monit will start the program I define under start program then monit will restart it if it stops.

But when I use check program monit will not automatically start it. If the program is running and for some reason it stops with an exit code other than 0 the monit will not restart it (see my configuration below).

I’m really not sure how to properly start and restart the program based on my exit codes.

My config file looks like this:

 set logfile /tmp/monit.log

    set daemon  1
    check program MyProgram with path /monit/MyProgram.py
            and with timeout 3600 seconds 
        every 1 cycles
        start program = /monit/MyProgram.py with timeout 3600 seconds
        if status > 200 then restart
        if status < 201 then stop
        if 2 restart 5 cycles then exec /monit/custom_script.sh
        if 2 restart 5 cycles then stop

and I have tried starting monit like this:

  • monit -c monitrc -vv
  • monit -c monitrc start all -vv
  • monit -c monitrc start MyProgram -vv

Comments (5)

  1. Tildeslash repo owner

    The "check program" is kind of plugin interface ... it's goal is to execute the given script per schedule similarly to cron, where the executed program is supposed to just do some (quick) tests and exit with status to signalize if the check succeeded or failed (for example collect status of RAID, sensors, etc.). Monit then reacts on the exit value.

    If you want to monitor long-running process, please use the "check process" statement.

  2. Juan Manuel Torres reporter

    Thank you for your answer, it helps put things in context. We have been using check process to monitor our long running processes, but is it possible to check the exit status of an application and restart it based on that?

    For example, our app uses different exit codes, some of these mean that something bad happen but the app can be safely restarted, others mean there is an fatal failure and restarting will not help. Is it possible to do this with check process? My understanding, from reading the docs, is that checking exit codes is not possible with check process. Are there any alternatives?

    Thank you.

  3. Log in to comment