MONIT doesn't create multiple alerts incase multiple lines picked up during file content test
It Actually talks about 3 issues -
- During file content check if MONIT finds multiple lines in a file in between . It puts whole data in MONIT_DESCRIPTION and create single alert
2) Along with that it doesn’t consider “\n” in MONIT_DESCRIPTION.
So if anyone wants to use MONIT_DESCRIPTION in a script to perform line wise action with in it is hard to understand each line from $MONIT_DESCRIPTION.
3) During file content check increasing FILECONTENTBUFFER
sometime doesn’t help. As there could be possibility to have more line during check.
So, MONIT should provide a switch or option to create alerts for each line. Just a thought !
Comments (12)
-
-
reporter Hello Lutz Mader,
What is your matching criteria set to monit.rc ? For my case , matching criteria is as below.
If content = "(.*?)" then exec '/etc/test.sh'
[Note : Each line of of target file is important hence i considered “.*?”]
This matching criteria is capturing multiple lines from a file (log) and putting into MONIT_DESCRIPTION.
In case , if it has captured more than 1 line at time .. below issues are observed
- MONIT_DESCRIPTION in test.sh is not showing all lines as buffercontentbuffer is becoming full ( I already increased it from default value)
- If MONIT_DESCRIPTION has captured 2-3 lines from log file . It is hard to understand two different lines.
Am I doing anything wrong ? Any suggestion would be helpful for me.
Below is the snippet of log file :
2020-05-18T03:01:29.523Z;;XXXX_avijit_SRVCC_Session_Transfer_SR_testsmes1;CRITICAL;OPEN;testsmes1;sm-alerts;sm-XXXX-alerts;XXXX avijit KPI threshold violation - it's at or below 99. See query for details.;config file for this alert is in XXXX directory on testsmkib server. Current value: NaN, threshold value: None;https://sm.test.in.test.com:1234/app/kibana#/discover;;
2020-05-18T04:15:58.582Z;;XXXX_central_Announcement_Success_Rate_testsmes1;CRITICAL;OPEN;testsmes1;sm-alerts;sm-XXXX-alerts;XXXX central KPI threshold violation - it's at or below 99. See query for details.;config file for this alert is in XXXX directory on testsmkib server. Current value: NaN, threshold value: None;https://sm.test.in.test.com:1234/app/kibana#/discover;;
2020-05-18T04:02:04.227Z;;XXXX_avijit_SRVCC_Session_Transfer_SR_testsmes1;CRITICAL;OPEN;testsmes1;sm-alerts;sm-XXXX-alerts;XXXX avijit KPI threshold violation - it's at or below 99. See query for details.;config file for this alert is in XXXX directory on testsmkib server. Current value: NaN, threshold value: None;https://sm.test.in.test.com:1234/app/kibana#/discover;;
2020-05-18T04:31:08.985Z;;XXXX_avijit_SRVCC_Session_Transfer_SR_testsmes1;CRITICAL;OPEN;testsmes1;sm-alerts;sm-XXXX-alerts;XXXX avijit KPI threshold violation - it's at or below 99. See query for details.;config file for this alert is in XXXX directory on testsmkib server. Current value: NaN, threshold value: None;https://sm.test.in.test.com:1234/app/kibana#/discover;;
2020-05-18T05:00:44.189Z;;XXXX_avijit_SRVCC_Session_Transfer_SR_testsmes1;CRITICAL;OPEN;testsmes1;sm-alerts;sm-XXXX-alerts;XXXX avijit KPI threshold violation - it's at or below 99. See query for details.;config file for this alert is in XXXX directory on testsmkib server. Current value: NaN, threshold value: None;https://sm.test.in.test.com:1234/app/kibana#/discover;;
2020-05-18T06:16:21.673Z;;XXXX_central_SRVCC_Session_Transfer_SR_testsmes1;CRITICAL;OPEN;testsmes1;sm-alerts;sm-XXXX-alerts;XXXX central KPI threshold violation - it's at or below 99. See query for details.;config file for this alert is in XXXX directory on testsmkib server. Current value: NaN, threshold value: None;https://sm.test.in.test.com:1234/app/kibana#/discover;;
2020-05-18T06:00:42.675Z;;XXXX_avijit_SRVCC_Session_Transfer_SR_testsmes1;CRITICAL;OPEN;testsmes1;sm-alerts;sm-XXXX-alerts;XXXX avijit KPI threshold violation - it's at or below 99. See query for details.;config file for this alert is in XXXX directory on testsmkib server. Current value: NaN, threshold value: None;https://sm.test.in.test.com:1234/app/kibana#/discover;;
2020-05-18T06:30:41.536Z;;XXXX_avijit_SRVCC_Session_Transfer_SR_testsmes1;CRITICAL;OPEN;testsmes1;sm-alerts
-
repo owner - marked as enhancement
- marked as minor
-
repo owner - changed status to wontfix
Hello,
originally monit id send one alert per each matching line (same way as you describe), but that was changed as if the log file contains hundreds of matching entries, it would flood the user with alerts.
We don't plan to change it back to 1:1.
-
reporter Hello,
I can understand your point N:1 mapping to the alert. To some extend you are correct.
But in case , when you have multiple lines in a single environment variable ( MONIT_DESCRIPTION) usage of of environment variable to the some script and alert become questionable as it doesn’t provide complete information about bunch of different problems.
For an example - If I consider “Error” from log file will be considered as alert. There could be possibility that one log contain multiple Errors(say 10 with different error type). If we can’t put all 10 error to a alert or environment variable there is a chance of missing problem.
Hence i request if MONIT can provide an option for MONIT user to decided to switch between multiple line to 1 line with in env. variable or alert. Or providing an option to decide on bunch of lines to be considered per alerts .
Thanks,
Avijit
-
Hello Avijit Seth,
I never try to flood monit with messages. Sometime I get some (less 10 message) at the same time. This works well.From my point of view, monit handle a small number of messages well. I get all the captured messages from MONIT_DESCRIPTION.
Ok, this not not the answer to your question,
sorry,
Lutz -
reporter Hello Lutz,
Could you please suggest what is the
FILECONTENTBUFFER
value for your case ?Thanks,
Avijit
-
Hi All,
Thanks for raising this, I have also a similar issue and I was looking for a solution, but not yet found a way which assures the quality. It is nice to know that in the past it worked the way I’d love to have it now, so maybe is it possible to make processing line by line as an optional setting? That would really help, cause now I fear sometimes (and I already faced it) I miss alarms in this case, because it is not guaranteed that every line is processed, or it can be inside the limited size of variable.
Is it possible to suggest on this optional thing? I’d really appreciate it.
Thank you,
Attila -
Hello Avijit Seth,
I use the following limits settings only.set daemon 60 # check services at 60 seconds intervals set limits { programOutput: 1024 B, # check program's output truncate limit fileContentBuffer: 1024 B, # limit for file content test }
In general I get up to 9 message lines. I handle the lines, line by line, with the code from my sample above, but I remove the duplicate lines.
This works well for me, but today only some lines are captured and handled (or send to an central system).
# Remove duplicate lines. echo "$MONIT_DESCRIPTION" | uniq | \ while read desc; do : done
With regards,
Lutz -
Hello Avijit Seth,
this is a simple test case to do some additional investigation, see below.
I use Monit 5.26.0 on a MacOS 10.10 system with KSH 93 or BASH 3.2, but KSH 88, BASH 4.3 on AIX 7.2 fit also.With regards,
LutzThe configuration used to collect some messages.
check file file.log with path "/Users/lutz/log/file.log" if not exist then exec "/usr/bin/touch /Users/lutz/log/file.log" if match "error" then exec "/Users/lutz/monit/env.sh"
The script to handle the captured messages.
#!/bin/ksh echo "$MONIT_DESCRIPTION" | \ while read desc; do echo "XX $desc" >> /Users/lutz/log/env.log done echo "XXXXXXXX" >> /Users/lutz/log/env.log exit 0
The content of the file.log file, I use “echo 'error '
date
>> file.log“ to add new lines.error Do 28 Mai 2020 21:35:35 CEST error Do 28 Mai 2020 21:35:37 CEST test Do 28 Mai 2020 21:35:44 CEST test Do 28 Mai 2020 21:35:47 CEST error Do 28 Mai 2020 21:35:53 CEST error Do 28 Mai 2020 21:35:55 CEST error Do 28 Mai 2020 21:36:08 CEST test Do 28 Mai 2020 21:36:12 CEST test Do 28 Mai 2020 21:36:16 CEST test Do 28 Mai 2020 21:36:19 CEST error Do 28 Mai 2020 21:36:21 CEST warn Do 28 Mai 2020 21:36:32 CEST warn Do 28 Mai 2020 21:36:34 CEST warn Do 28 Mai 2020 21:36:36 CEST error Do 28 Mai 2020 21:36:39 CEST
The messages are handled by the script line by line, from my point of view, there is no problem to do this.
XX content match: XX error Do 28 Mai 2020 21:35:35 CEST XX error Do 28 Mai 2020 21:35:37 CEST XX error Do 28 Mai 2020 21:35:53 CEST XX XXXXXXXX XX content match: XX error Do 28 Mai 2020 21:35:55 CEST XX error Do 28 Mai 2020 21:36:08 CEST XX error Do 28 Mai 2020 21:36:21 CEST XX XXXXXXXX XX content match: XX error Do 28 Mai 2020 21:36:39 CEST XX XXXXXXXX
-
reporter Thank you so much Lutz for your continuous help on this.
I am now able read line by line with in my script by your shown way and i have getting all lines under MONIT_DESCRIPTION after increasing buffer size and reducing the daemon frequency.
My only worry now are below
- in case buffer size full with more number of lines , I may miss the something .
- Changing to lower frequency and increasing buffer size may cause some performance issue to VM.
Any thought is always welcome!!
Thanks
Avijit
-
Hello Avijit Seth,
Yes, that is a problem. It took some time, but I decided not to forward some messages, especially because they are mostly double.The 60 seconds interval fit well for me and is easy to calculate on the one hand, and the buffer of 1024 Byte is large enough to handle up to 5-9 messages. On the other, a slow interval save some cpu and other resources and prevents a system slow down by to much/fast useless recovery retries.
Some of my applications, I try to handle with monit, spend 30 to 50 minutes to became ready to work. Therefore it is not a problem to wait five minutes, monit try to restart the application.
You have to make a decision, sorry.
Have a nice weekend,
with regards,
Lutz - Log in to comment
Hello Avijit Seth,
to your 1st and 2nd remark. I use the following to handle captured messages, see my snipped.
This works well for me, but I remove duplicate lines in general.
With regards,
Lutz