MONIT doesn't create multiple alerts incase multiple lines picked up during file content test

Issue #896 wontfix
avijit seth created an issue

It Actually talks about 3 issues -

  1. During file content check if MONIT finds multiple lines in a file in between . It puts whole data in MONIT_DESCRIPTION and create single alert

2) Along with that it doesn’t consider “\n” in MONIT_DESCRIPTION.

So if anyone wants to use MONIT_DESCRIPTION in a script to perform line wise action with in it is hard to understand each line from $MONIT_DESCRIPTION.

3) During file content check increasing FILECONTENTBUFFER sometime doesn’t help. As there could be possibility to have more line during check.

So, MONIT should provide a switch or option to create alerts for each line. Just a thought ! 🙂

Comments (12)

  1. Lutz Mader

    Hello Avijit Seth,
    to your 1st and 2nd remark. I use the following to handle captured messages, see my snipped.

    # Handle matched lines.
    if [ -n "$MONIT_DESCRIPTION" ]; then
      echo "$MONIT_DESCRIPTION" | grep -e "^content match" >/dev/null
      if [ $? -eq 0 ]; then
    
    # Remove duplicate lines.
        echo "$MONIT_DESCRIPTION" | uniq | \
        while read desc; do
    # CWWKO0221E: TCP Channel ? initialization did not succeed. The socket bind did not succeed
    #   for host ? and port ?. The port might already be in use.
    # CWWKO0221E: Die Initialisierung des TCP-Kanals ? war nicht erfolgreich. Das Binden des Sockets
    #   für den Host ? und den Port ? war nicht erfolgreich. Der Port ist möglicherweise bereits im Gebrauch.
          if [[ "$desc" = *'CWWKO0221E:'* ]]; then
            port=`echo "$desc" | sed -n -e 's/^.* and port \(.*\)\. The port .*/\1/p'`
            [ "$port" = "" ] && port=`echo "$desc" | sed -n -e 's/^.* und den Port \(.*\) war nicht .*/\1/p'`
            [ "$port" != "" ] && pidlist=`lsof -i tcp:$port -P | awk '/'"$USER"' .*'"$port"'->.*ESTABLISHED/ { print $2 }'` || pidlist=''
    # Kill the processes that use the port.
            if [ "$pidlist" != "" ]; then
              kill -KILL $pidlist 2>/dev/null && sleep 1
            fi
          fi
        done
      fi
    fi
    

    This works well for me, but I remove duplicate lines in general.

    With regards,
    Lutz

  2. avijit seth reporter

    Hello Lutz Mader,

    What is your matching criteria set to monit.rc ? For my case , matching criteria is as below.

    If content = "(.*?)" then exec '/etc/test.sh'

    [Note : Each line of of target file is important hence i considered “.*?”]

    This matching criteria is capturing multiple lines from a file (log) and putting into MONIT_DESCRIPTION.

    In case , if it has captured more than 1 line at time .. below issues are observed

    1. MONIT_DESCRIPTION in test.sh is not showing all lines as buffercontentbuffer is becoming full ( I already increased it from default value)
    2. If MONIT_DESCRIPTION has captured 2-3 lines from log file . It is hard to understand two different lines. 😞

    Am I doing anything wrong ? Any suggestion would be helpful for me.

    Below is the snippet of log file :

    2020-05-18T03:01:29.523Z;;XXXX_avijit_SRVCC_Session_Transfer_SR_testsmes1;CRITICAL;OPEN;testsmes1;sm-alerts;sm-XXXX-alerts;XXXX avijit KPI threshold violation - it's at or below 99. See query for details.;config file for this alert is in XXXX directory on testsmkib server. Current value: NaN, threshold value: None;https://sm.test.in.test.com:1234/app/kibana#/discover;;
    2020-05-18T04:15:58.582Z;;XXXX_central_Announcement_Success_Rate_testsmes1;CRITICAL;OPEN;testsmes1;sm-alerts;sm-XXXX-alerts;XXXX central KPI threshold violation - it's at or below 99. See query for details.;config file for this alert is in XXXX directory on testsmkib server. Current value: NaN, threshold value: None;https://sm.test.in.test.com:1234/app/kibana#/discover;;
    2020-05-18T04:02:04.227Z;;XXXX_avijit_SRVCC_Session_Transfer_SR_testsmes1;CRITICAL;OPEN;testsmes1;sm-alerts;sm-XXXX-alerts;XXXX avijit KPI threshold violation - it's at or below 99. See query for details.;config file for this alert is in XXXX directory on testsmkib server. Current value: NaN, threshold value: None;https://sm.test.in.test.com:1234/app/kibana#/discover;;
    2020-05-18T04:31:08.985Z;;XXXX_avijit_SRVCC_Session_Transfer_SR_testsmes1;CRITICAL;OPEN;testsmes1;sm-alerts;sm-XXXX-alerts;XXXX avijit KPI threshold violation - it's at or below 99. See query for details.;config file for this alert is in XXXX directory on testsmkib server. Current value: NaN, threshold value: None;https://sm.test.in.test.com:1234/app/kibana#/discover;;
    2020-05-18T05:00:44.189Z;;XXXX_avijit_SRVCC_Session_Transfer_SR_testsmes1;CRITICAL;OPEN;testsmes1;sm-alerts;sm-XXXX-alerts;XXXX avijit KPI threshold violation - it's at or below 99. See query for details.;config file for this alert is in XXXX directory on testsmkib server. Current value: NaN, threshold value: None;https://sm.test.in.test.com:1234/app/kibana#/discover;;
    2020-05-18T06:16:21.673Z;;XXXX_central_SRVCC_Session_Transfer_SR_testsmes1;CRITICAL;OPEN;testsmes1;sm-alerts;sm-XXXX-alerts;XXXX central KPI threshold violation - it's at or below 99. See query for details.;config file for this alert is in XXXX directory on testsmkib server. Current value: NaN, threshold value: None;https://sm.test.in.test.com:1234/app/kibana#/discover;;
    2020-05-18T06:00:42.675Z;;XXXX_avijit_SRVCC_Session_Transfer_SR_testsmes1;CRITICAL;OPEN;testsmes1;sm-alerts;sm-XXXX-alerts;XXXX avijit KPI threshold violation - it's at or below 99. See query for details.;config file for this alert is in XXXX directory on testsmkib server. Current value: NaN, threshold value: None;https://sm.test.in.test.com:1234/app/kibana#/discover;;
    2020-05-18T06:30:41.536Z;;XXXX_avijit_SRVCC_Session_Transfer_SR_testsmes1;CRITICAL;OPEN;testsmes1;sm-alerts

  3. Tildeslash repo owner

    Hello,

    originally monit id send one alert per each matching line (same way as you describe), but that was changed as if the log file contains hundreds of matching entries, it would flood the user with alerts.

    We don't plan to change it back to 1:1.

  4. avijit seth reporter

    Hello,

    I can understand your point N:1 mapping to the alert. To some extend you are correct.

    But in case , when you have multiple lines in a single environment variable ( MONIT_DESCRIPTION) usage of of environment variable to the some script and alert become questionable as it doesn’t provide complete information about bunch of different problems.

    For an example - If I consider “Error” from log file will be considered as alert. There could be possibility that one log contain multiple Errors(say 10 with different error type). If we can’t put all 10 error to a alert or environment variable there is a chance of missing problem.

    Hence i request if MONIT can provide an option for MONIT user to decided to switch between multiple line to 1 line with in env. variable or alert. Or providing an option to decide on bunch of lines to be considered per alerts .

    Thanks,

    Avijit

  5. Lutz Mader

    Hello Avijit Seth,
    I never try to flood monit with messages. Sometime I get some (less 10 message) at the same time. This works well.

    From my point of view, monit handle a small number of messages well. I get all the captured messages from MONIT_DESCRIPTION.

    Ok, this not not the answer to your question,
    sorry,
    Lutz

  6. avijit seth reporter

    Hello Lutz,

    Could you please suggest what is the FILECONTENTBUFFER value for your case ?

    Thanks,

    Avijit

  7. Attila Jászai

    Hi All,

    Thanks for raising this, I have also a similar issue and I was looking for a solution, but not yet found a way which assures the quality. It is nice to know that in the past it worked the way I’d love to have it now, so maybe is it possible to make processing line by line as an optional setting? That would really help, cause now I fear sometimes (and I already faced it) I miss alarms in this case, because it is not guaranteed that every line is processed, or it can be inside the limited size of variable.

    Is it possible to suggest on this optional thing? I’d really appreciate it.

    Thank you,
    Attila

  8. Lutz Mader

    Hello Avijit Seth,
    I use the following limits settings only.

    set daemon  60              # check services at 60 seconds intervals
    
    set limits {
        programOutput:     1024 B,    # check program's output truncate limit
        fileContentBuffer: 1024 B,    # limit for file content test
    }
    

    In general I get up to 9 message lines. I handle the lines, line by line, with the code from my sample above, but I remove the duplicate lines.

    This works well for me, but today only some lines are captured and handled (or send to an central system).

    # Remove duplicate lines.
        echo "$MONIT_DESCRIPTION" | uniq | \
        while read desc; do
    :
        done
    

    With regards,
    Lutz

  9. Lutz Mader

    Hello Avijit Seth,
    this is a simple test case to do some additional investigation, see below.
    I use Monit 5.26.0 on a MacOS 10.10 system with KSH 93 or BASH 3.2, but KSH 88, BASH 4.3 on AIX 7.2 fit also.

    With regards,
    Lutz

    The configuration used to collect some messages.

    check file file.log with path "/Users/lutz/log/file.log"
      if not exist then exec "/usr/bin/touch /Users/lutz/log/file.log"
      if match "error" then exec "/Users/lutz/monit/env.sh"
    

    The script to handle the captured messages.

    #!/bin/ksh
    echo "$MONIT_DESCRIPTION" | \
    while read desc; do
      echo "XX $desc" >> /Users/lutz/log/env.log
    done
    echo "XXXXXXXX" >> /Users/lutz/log/env.log
    exit 0
    

    The content of the file.log file, I use “echo 'error ' date >> file.log“ to add new lines.

    error  Do 28 Mai 2020 21:35:35 CEST
    error  Do 28 Mai 2020 21:35:37 CEST
    test  Do 28 Mai 2020 21:35:44 CEST
    test  Do 28 Mai 2020 21:35:47 CEST
    error  Do 28 Mai 2020 21:35:53 CEST
    error  Do 28 Mai 2020 21:35:55 CEST
    error  Do 28 Mai 2020 21:36:08 CEST
    test  Do 28 Mai 2020 21:36:12 CEST
    test  Do 28 Mai 2020 21:36:16 CEST
    test  Do 28 Mai 2020 21:36:19 CEST
    error  Do 28 Mai 2020 21:36:21 CEST
    warn  Do 28 Mai 2020 21:36:32 CEST
    warn  Do 28 Mai 2020 21:36:34 CEST
    warn  Do 28 Mai 2020 21:36:36 CEST
    error  Do 28 Mai 2020 21:36:39 CEST
    

    The messages are handled by the script line by line, from my point of view, there is no problem to do this.

    XX content match:
    XX error  Do 28 Mai 2020 21:35:35 CEST
    XX error  Do 28 Mai 2020 21:35:37 CEST
    XX error  Do 28 Mai 2020 21:35:53 CEST
    XX
    XXXXXXXX
    XX content match:
    XX error  Do 28 Mai 2020 21:35:55 CEST
    XX error  Do 28 Mai 2020 21:36:08 CEST
    XX error  Do 28 Mai 2020 21:36:21 CEST
    XX
    XXXXXXXX
    XX content match:
    XX error  Do 28 Mai 2020 21:36:39 CEST
    XX
    XXXXXXXX
    

  10. avijit seth reporter

    Thank you so much Lutz for your continuous help on this.

    I am now able read line by line with in my script by your shown way and i have getting all lines under MONIT_DESCRIPTION after increasing buffer size and reducing the daemon frequency.

    My only worry now are below

    1. in case buffer size full with more number of lines , I may miss the something .
    2. Changing to lower frequency and increasing buffer size may cause some performance issue to VM.

    Any thought is always welcome!!

    Thanks

    Avijit

  11. Lutz Mader

    Hello Avijit Seth,
    Yes, that is a problem. It took some time, but I decided not to forward some messages, especially because they are mostly double.

    The 60 seconds interval fit well for me and is easy to calculate on the one hand, and the buffer of 1024 Byte is large enough to handle up to 5-9 messages. On the other, a slow interval save some cpu and other resources and prevents a system slow down by to much/fast useless recovery retries.

    Some of my applications, I try to handle with monit, spend 30 to 50 minutes to became ready to work. Therefore it is not a problem to wait five minutes, monit try to restart the application.

    You have to make a decision, sorry.

    Have a nice weekend,
    with regards,
    Lutz

  12. Log in to comment