monit crashes at stop action
Hi
Monit version = 5.8.1 OS = Red Hat Enterprise Linux Server release 6.4 64bits
Tests have been made with compiled version then source code version
Below an extract of my config file check process stress matching "stress*" stop program = "/bin/true" if cpu > 24% for 1 cycles then alert if cpu > 24% for 2 cycles then restart
The goal is to monitor a process and kill it when it uses more than 24% of cpu (100% of 4 core cpu). the restart is useless in this context (the process being launched by a third part process)
Regardless the stop program and the command among stop and restart, monitor crashes each time it does the stop action : Jul 22 11:23:08 pedtanya03 monit[18859]: 'stress' process is running with pid 18861 Jul 22 11:23:18 pedtanya03 monit[18859]: 'stress' cpu usage of 24.9% matches resource limit [cpu usage>24.0%] Jul 22 11:23:18 pedtanya03 monit[18859]: 'stress' cpu usage of 24.9% matches resource limit [cpu usage>24.0%]
Jul 22 11:23:28 pedtanya03 monit[18859]: 'stress' cpu usage of 24.9% matches resource limit [cpu usage>24.0%] Jul 22 11:23:28 pedtanya03 monit[18859]: 'stress' trying to restart Jul 22 11:23:28 pedtanya03 monit[18859]: 'stress' stop: /bin/true Jul 22 11:23:28 pedtanya03 abrtd: Directory 'ccpp-2014-07-22-11:23:28-18859' creation detected Jul 22 11:23:28 pedtanya03 abrt[18870]: Saved core dump of pid 18859 (/root/monit-5.8.1/monit) to /var/spool/abrt/ccpp-2014-07-22-11:23:28-18859 (11440128 bytes) Jul 22 11:23:28 pedtanya03 abrtd: Executable '/root/monit-5.8.1/monit' doesn't belong to any package Jul 22 11:23:28 pedtanya03 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2014-07-22-11:23:28-18859' exited with 1 Jul 22 11:23:28 pedtanya03 abrtd: Corrupted or bad directory '/var/spool/abrt/ccpp-2014-07-22-11:23:28-18859', deleting
Unfortunately, the core dumped is systematically deleted by the abrtd deamon.
have you any idea ?
Comments (8)
-
repo owner -
some news: below the backtrace: #0 0x0000003c63c328a5 in raise () from /lib64/libc.so.6
#10x0000003c63c34085 in abort () from /lib64/libc.so.6#20x0000003c63c2ba1e in __assert_fail_base () from /lib64/libc.so.6#30x0000003c63c2bae0 in __assert_fail () from /lib64/libc.so.6#40x0000000000418867 in wait_process (s=0x19d0690, expect=Process_Stopped) at src/control.c:92 #5 0x0000000000418ca1 in do_stop (s=0x19d0690, flag=0) at src/control.c:170#60x00000000004196e9 in control_service (S=0x19d0650 "stress", A=2) at src/control.c:417#70x000000000041b266 in handle_action (E=0x19e4fa0, A=0x19d13e0) at src/event.c:709#80x000000000041b01a in handle_event (E=0x19e4fa0) at src/event.c:653#90x000000000041a134 in Event_post (service=0x19d0690, id=2, state=1, action=0x19d13c0, s=0x47bc29 "%s") at src/event.c:223#100x00000000004319cb in check_process_resources (s=0x19d0690, r=0x19d1420) at src/validate.c:400 #11 0x0000000000433d79 in check_process (s=0x19d0690) at src/validate.c:1026#120x0000000000433aba in validate () at src/validate.c:973#130x0000000000417190 in do_default () at src/monit.c:572#140x0000000000416a2f in do_action (args=0x7fff6696cf98) at src/monit.c:412#150x00000000004164fe in main (argc=3, argv=0x7fff6696cf98) at src/monit.c:166I'll send you the core dump and the binary
-
repo owner Thanks for data. Monit stopped, because no start nor restart program was defined => to fix the issue, just add "start program" ... on RHEL6 we recommend also "restart program", as upstart has synchronization problems if start is called before stop finished (when "restart program" is defined, monit will call only that instead of stop+start).
The start/restart was refactored in the development version already and Monit won't stop even if no start/restart is defined and restart action callled.
You can get the development snapshot here: https://bitbucket.org/tildeslash/monit/get/master.tar.gz
To compile:
tar -xzf master.tar.gz cd tildeslash* ./bootstrap ./configure make
-
repo owner - changed status to resolved
- fixed in development version
- workaround: define start/restart program
-
Ok, that's works......thanks
But is there a mean to refrain monit from trying to start the process and tell it only to check process CPU usage
-
repo owner If you don't want to restart, then just don't define any stop/start/restart program or use just the alert action - for example:
check process stress matching "stress*" if cpu > 24% for 2 cycles then alert
There is also "mode passive" option which allows to ignore any restart attempts.
-
Ooops I mispoke.....my need is as follow: I need to check CPU usage of a process launched by a third part application If the process CPU usage is too high, monit have to stop it by a kill signal The process will be launched on demand by the third part application
does the "mode passive" meets my need ?
-
repo owner Then use the "stop" action instead of restart:
check process stress matching "stress*" stop program = "..." if cpu > 24% for 1 cycles then alert if cpu > 24% for 2 cycles then stop
- Log in to comment
Hello,
i'm unable to reproduce the issue (on CentOS 6.5) ... using the following configuration:
You can enable the coredump by modification of /etc/abrt/abrt-action-save-package-data.conf and changing ProcessUnpackaged=no to ProcessUnpackaged=yes and then running "service abrtd restart".
When you'll have the coredump, please send it to support@mmonit.com along with the binary /root/monit-5.8.1/monit for which it was created.
Note that you have defined only the stop program but you use restart action ... in such case the process will be stopped only, as monit doesn't have the "start program" defined and you'll see output like this: