monit daemon doesn't start if stale PID file exists

Issue #1083 resolved
Austin Payne created an issue

I’m observing an issue (first seen in a docker image) where the monit daemon does not start after an unclean shutdown. It is due to a stale PID file existing in /var/run/monit.pid even though the process itself no longer exists. I have created a simple, docker based reproducer here: https://github.com/Austinpayne/monit-pid-reproducer

I’m wondering what the solution is. It seems simple enough for monit to check if the PID file exists then also check if that process is a running daemon and either a) exit if it does in fact exist or restart the daemon or b) continue to restart the daemon. However, maybe this is the intended behavior? I did not find any mention of the current behavior in the docs or issues here. For my own purposes the workaround in the linked repo is sufficient.

Comments (6)

  1. Tildeslash repo owner

    There are two things that can do this AFAICS,

    In both cases you should be able to see something in the Monit log file if you start Monit in debug mode. (monit -v)

    Ps. I took a quick look at your repo and the problem is, as expected, the first case above. Monit reads pid = 1 from its pid file, test if it belongs to a running process and if so assume it is Monit and wakes it up and report Monit daemon with PID 1 awakened and then quits. Unfortunately pid 1 is no longer Monit, but some other process. So your entrypoint-workaround is actually what I would recommend. We could inspect the OS’s process table and the actual short command of the pid and check that it is monit, but that is starting to dig a rabbit hole.

  2. Austin Payne reporter

    Thanks for the input @Tildeslash . It does appear to be case 1 like you say. Is this behavior documented in the docs somewhere? For what it’s worth, here is a run with debug logging turned on:

    $ docker start -a monit
     New Monit id: 41b632b21dab57a919a813767e896bb0
     Stored in '/var/lib/monit/id'
    Runtime constants:
     Control file       = /etc/monit/monitrc
     Log file           = /var/log/monit.log
     Pid file           = /run/monit.pid
     Id file            = /var/lib/monit/id
     State file         = /var/lib/monit/state
     Debug              = True
     Log                = True
     Use syslog         = False
     Is Daemon          = True
     Use process engine = True
     Limits             = {
                        =   programOutput:     512 B
                        =   sendExpectBuffer:  256 B
                        =   fileContentBuffer: 512 B
                        =   httpContentBuffer: 1 MB
                        =   networkTimeout:    5 s
                        =   programTimeout:    5 m
                        =   stopTimeout:       30 s
                        =   startTimeout:      30 s
                        =   restartTimeout:    30 s
                        = }
     On reboot          = start
     Poll time          = 15 seconds with start delay 0 seconds
     Event queue        = base directory /var/lib/monit/events with 100 slots
     Start monit httpd  = False
    
    The service list contains the following entries:
    
    System Name           = a4ff9d635b20
     Monitoring mode      = active
     On reboot            = start
    
    -------------------------------------------------------------------------------
    pidfile '/run/monit.pid' does not exist
    Starting Monit 5.31.0 daemon
    'a4ff9d635b20' Monit 5.31.0 started
    Processing postponed events queue
    Cannot read proc file '/proc/1/attr/current' -- Invalid argument
    $ docker kill --signal KILL monit
    $ docker start -a monit
    Runtime constants:
     Control file       = /etc/monit/monitrc
     Log file           = /var/log/monit.log
     Pid file           = /run/monit.pid
     Id file            = /var/lib/monit/id
     State file         = /var/lib/monit/state
     Debug              = True
     Log                = True
     Use syslog         = False
     Is Daemon          = True
     Use process engine = True
     Limits             = {
                        =   programOutput:     512 B
                        =   sendExpectBuffer:  256 B
                        =   fileContentBuffer: 512 B
                        =   httpContentBuffer: 1 MB
                        =   networkTimeout:    5 s
                        =   programTimeout:    5 m
                        =   stopTimeout:       30 s
                        =   startTimeout:      30 s
                        =   restartTimeout:    30 s
                        = }
     On reboot          = start
     Poll time          = 15 seconds with start delay 0 seconds
     Event queue        = base directory /var/lib/monit/events with 100 slots
     Start monit httpd  = False
    
    The service list contains the following entries:
    
    System Name           = a4ff9d635b20
     Monitoring mode      = active
     On reboot            = start
    
    -------------------------------------------------------------------------------
    Monit daemon with PID 1 awakened
    

  3. Tildeslash repo owner

    I don’t think it is documented. It could warrant a FAQ entry though. I'm closing this as resolved - you found the solution yourselves :-) Under normal operations Monit will create and delete its pid file. But on server hard shutdown or crash and subsequent startup, this situation is unlikely but definitely might occur. It is best to just remove Monit's pid file before Monit is started as a general rule. Thanks for great testing and feel free to enter an entry in the wiki FAQ

  4. Henning Bopp

    What you could actually do in containers to avoid that problem, is to store PID-files on tmpfs with (Docs docker)

    docker start --tmpfs /run
    

    or in a docker-compose.yaml with (Docs composer)

    tmps:
      - /run
    

    This will simply create a tmpfs that is definitely empty when the container starts.

  5. Log in to comment