Monit check program

Issue #1036 new
a b created an issue

Ive setup a monit con config file with check program feature.

This resource have dependencies.

When the resource failed, the dependencies restart correctly BUT After that the principal resources Will restart for three times with nonsense.

Can you help me?

Comments (3)

  1. alessandro.gorna

    Ok.

    The main resource is this:

    check program main-dev-meth01-core_eomdb1 with path "/bin/su - meth01 -c 'cluster/check_eomdb1.bash'"
    with timeout 35 seconds
    if status ne 110 for 3 cycles then restart
    start program = "/bin/su - meth01 -c 'cluster/start_stop_eomdb1.bash start'" with timeout 35 seconds
    stop program = "/bin/su - meth01 -c 'cluster/start_stop_eomdb1.bash stop'" with timeout 35 seconds
    depends on main-dev-meth01-core_versant,main-dev-meth01-core_nameservice
    if 3 restarts within 3 cycles then unmonitor
    if 3 restarts within 3 cycles then exec "/bin/su - meth01 -c 'cluster/start_stop_eomdb1.bash clean && cluster/start_stop_eomdb1.bash start'"
    if 5 restarts within 5 cycles then alert
    group main-dev-meth01-core
    onreboot nostart

    These following resources depends from this:

    main-dev-meth01-core_eomjse1, main-dev-meth01-core_tomcat_core, main-dev-meth01-core_alerter1

    When i kill the main resource main-dev-meth01-core_eomdb1, monit reacts in this mode (from /var/log/monit)

    [2022-04-14T10:56:51+0200] warning : 'main-dev-meth01-core_eomdb1' status failed (100) -- no output
    [2022-04-14T10:57:21+0200] warning : 'main-dev-meth01-core_eomdb1' status failed (100) -- no output
    [2022-04-14T10:57:51+0200] error : 'main-dev-meth01-core_eomdb1' status failed (100) -- no output
    [2022-04-14T10:57:51+0200] info : 'main-dev-meth01-core_eomdb1' trying to restart
    [2022-04-14T10:57:51+0200] info : 'main-dev-meth01-core_eomjse1' stop: '/bin/su - meth01 -c cluster/start_stop_eomjse1.bash stop'
    [2022-04-14T10:57:53+0200] info : 'main-dev-meth01-core_tomcat_core' stop: '/bin/su - meth01 -c cluster/start_stop_tomcat_core.bash stop'
    [2022-04-14T10:57:55+0200] info : 'main-dev-meth01-core_alerter1' stop: '/bin/su - meth01 -c cluster/start_stop_alerter1.bash stop'
    [2022-04-14T10:57:57+0200] info : 'main-dev-meth01-core_eomdb1' stop: '/bin/su - meth01 -c cluster/start_stop_eomdb1.bash stop'
    [2022-04-14T10:57:57+0200] info : 'main-dev-meth01-core_eomdb1' start: '/bin/su - meth01 -c cluster/start_stop_eomdb1.bash start'
    [2022-04-14T10:57:58+0200] info : 'main-dev-meth01-core_eomdb1' start: '/bin/su - meth01 -c cluster/start_stop_eomdb1.bash start'
    [2022-04-14T10:57:59+0200] error : 'main-dev-meth01-core_eomdb1' status failed (100) -- no output
    [2022-04-14T10:58:00+0200] info : 'main-dev-meth01-core_eomdb1' status succeeded (110) -- no output
    [2022-04-14T10:58:00+0200] info : 'main-dev-meth01-core_eomjse1' start: '/bin/su - meth01 -c cluster/start_stop_eomjse1.bash start'
    [2022-04-14T10:58:16+0200] info : 'main-dev-meth01-core_eomdb1' start: '/bin/su - meth01 -c cluster/start_stop_eomdb1.bash start'
    [2022-04-14T10:58:18+0200] info : 'main-dev-meth01-core_tomcat_core' start: '/bin/su - meth01 -c cluster/start_stop_tomcat_core.bash start'
    [2022-04-14T10:58:26+0200] info : 'main-dev-meth01-core_eomdb1' start: '/bin/su - meth01 -c cluster/start_stop_eomdb1.bash start'
    [2022-04-14T10:58:28+0200] info : 'main-dev-meth01-core_alerter1' start: '/bin/su - meth01 -c cluster/start_stop_alerter1.bash start'

    Why the main resource, main-dev-meth01-core_eomdb1, monit do three restart if the first was succesfully?

    The other dependencies are configured in the same manner.

    Instead if if I remove the dependencies attached to it the main resource do 1 restart (correct).

    Can you explain this?

    Why monit behaves incorrectly ? i suppose its a bug.

    But we need to use check program function.

    PLease help us, thanks in advance

  2. Lutz Mader

    Hello Alessandro,
    you are right, but this is the behavior of Monit.
    Monit test dependent processes only, for all other resources the defined start command will be executed only.

    This seems to be ugly, but a test program can run a long time, therefore Monit can not wait until the program or script complete and will continue without an additional test.

    A short answer only,
    Lutz

    p.s.
    Are you able to use the Monit functions to test the resources, the process.
    This is more useful instead of using scripts to do this, on the other hand the scripts could take over from VCS (or other tools) to Monit to start a simple migration.

    I switched to Monit functions as soon as possible, and the scripts became shorter and more easy because Monit do the job.

    On the other hand, a bit paranoia is prudent sometimes, therefore I use additional tests before I start/stop a process in my scripts, to prevent duplicate processes etc..

    For some more information see src/control.c in the source code, or wait for the answer from Tildeslash.

  3. Log in to comment