Monit considers processes to be monitored even though it is kinda aware it is dead

Issue #349 duplicate
Former user created an issue

We use with pidfile syntax to check is the process dead, and restart it if needed. This problem happens rarely, so it could be some sort of a concurrency problem.

However, we noticed that even though the actual process is dead; the monit still considers its status to be 'monitored', though it can't get any other info about it.

Example output

The Monit daemon 5.17.1 uptime: 3d 6h 6m 

Process 'process-0'
  status                            Running
  monitoring status                 Monitored
  pid                               3170
  parent pid                        1
  uid                               500
  effective uid                     500
  gid                               500
  uptime                            54m 
  threads                           22
  children                          0
  memory                            11.0 MB
  memory total                      11.0 MB
  memory percent                    0.0%
  memory percent total              0.0%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Mon, 18 Apr 2016 20:09:50

Process 'process-31'
  status                            Running
  monitoring status                 Monitored
  pid                               -
  parent pid                        -
  uid                               -
  effective uid                     -
  gid                               -
  uptime                            -
  threads                           -
  children                          -
  memory                            -
  memory total                      -
  memory percent                    -
  memory percent total              -
  cpu percent                       -
  cpu percent total                 -
  data collected                    Mon, 18 Apr 2016 20:09:50

Comments (7)

  1. Bariša Obradović

    The process configuration is a simple one

    check process #{process_name}
      with pidfile #{pidfile}
      start program = 'start_script' with timeout 10 seconds
      stop program = 'stop_script && rm #{pidfile}'
      group my_group
    
  2. Bariša Obradović

    Sorry, I didn't save the monit logs from the time of the issue. When it happens again, I download them and pass them along.

  3. Log in to comment