- edited description
SMTP error
Good morning! I get courier-mta SMTP errors on localhost. The load on that VPS is just 0.01
Description: failed protocol test [SMTP] at INET[127.0.0.1:25] via TCP --
SMTP: error receiving data -- Resource temporarily unavailable
Is it a closed port? Is Courier frozen? I've set timeout to 10 seconds. I've searched the C code. Please help me.
http://stackoverflow.com/a/10318268/1606933 Cloud it be about blocking/non-blocking IO?
(actually version is Debian wheezy 5.10-1)
Comments (26)
-
reporter -
repo owner Hello,
is it possible to connect on that machine to 127.0.0.1:25 for example using telnet?:
telnet 127.0.0.1 25
-
reporter I do not know actually :) This is the reason I've opened this issue. Monit is for monitoring.
Basicly port 25 is open and can be connected. Please start me up how to debug it.
-
repo owner What did the "telnet 127.0.0.1 25" showed?
-
reporter It is OK. Now. But sometimes monit alerts.
# telnet 127.0.0.1 25 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. 220 belmanetti.hu ESMTP
-
reporter BTW I know that SMTP is OK. But sometimes monit alert. And I'd like to get the root cause of it.
-
repo owner It seems as some transient network error (maybe some VPS bug - loopback issue). You can run the network trace (using for example tcpdump or wireshark/tshark) to see what happens when monit reports an error.
-
reporter I am monitoring on localhost, no networks are there.
-
reporter How to start a script on monit error? That sounds very interesting.
-
repo owner Even though loopback is pseudo-interface, it is works like normal network and can be traced with tcpdump/wireshark. If it freezes it is strange - as mentioned it could be some VPS issue, tcpdump can show the problem (provided the VPS will be able to run processes at the same time and won't freeze completely).
-
reporter Could monit be configured to start a script on error?
-
repo owner Yes, see the exec action in monit manual ... but in this case you need to catch the whole communication which triggered the error - after the tcp handshake. When you'll start tcpdump when the error occurred, it is too late unless the error will occur again in the given timeframe.
-
reporter -
reporter - changed status to resolved
Thank you!
-
reporter - changed status to open
Sometime courier restarts itself. Now I've cought it:
Dec 1 23:29:22 strass2 courierd: SHUTDOWN: Stopping...
Is there a way not to restart and send email at these self-restarts? -
repo owner If it is restarted by Monit, use "alert" action instead of "restart". If the restart is external, Monit has nothing to do with it.
-
reporter As I've mentioned courier restarts itself periodically. Is there a way to signal monit NOT to alert? Maybe
unmonitor
&monitor
? -
repo owner Monit may send PID change alert in such case, you can filter it out like this:
set alert foo@bar but not on { pid }
See more details in Monit manual: http://mmonit.com/monit/documentation/monit.html#Setting-an-event-filter
If you see different alert, please post the alert message.
-
reporter Thank you! I am still debugging. I plan to report here when it turned out what it really is.
-
reporter Could you please take a look at the status of the system at courier failure. At the end: SMTP localhost/IPv4/IPv6/monit log/syslog
2014-12-03 03:59:45+01:00 root ------------------------- 2014-12-03 03:59:45+01:00 Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.1:3306 0.0.0.0:* LISTEN 5077/mysqld tcp 0 0 0.0.0.0:** 0.0.0.0:* LISTEN 4828/sshd tcp 0 0 127.0.0.1:2812 0.0.0.0:* LISTEN 5455/monit tcp 0 0 127.0.0.1:36278 127.0.0.1:3306 TIME_WAIT - tcp 0 0 127.0.0.1:37931 127.0.0.1:25 TIME_WAIT - tcp 0 0 127.0.0.1:60884 127.0.0.1:3022 TIME_WAIT - tcp6 0 0 :::3022 :::* LISTEN 4828/sshd tcp6 0 0 :::80 :::* LISTEN 4481/apache2 tcp6 0 0 :::465 :::* LISTEN 4651/couriertcpd tcp6 0 0 :::4949 :::* LISTEN 4335/perl tcp6 0 0 :::25 :::* LISTEN 6297/couriertcpd udp6 0 0 :::42271 :::* 26301/courieresmtp ------------------------- 2014-12-03 03:59:45+01:00 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 2 0.0 0.0 0 0 ? S Dec01 0:00 [kthreadd] root 3 0.0 0.0 0 0 ? S Dec01 0:10 \_ [ksoftirqd/0] root 5 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [kworker/0:0H] root 7 0.0 0.0 0 0 ? S Dec01 0:00 \_ [migration/0] root 8 0.0 0.0 0 0 ? S Dec01 0:00 \_ [rcu_bh] root 9 0.0 0.0 0 0 ? S Dec01 1:12 \_ [rcu_sched] root 10 0.0 0.0 0 0 ? S Dec01 0:00 \_ [migration/1] root 11 0.0 0.0 0 0 ? S Dec01 0:07 \_ [ksoftirqd/1] root 12 0.0 0.0 0 0 ? S Dec01 0:00 \_ [kworker/1:0] root 13 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [kworker/1:0H] root 14 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [khelper] root 15 0.0 0.0 0 0 ? S Dec01 0:00 \_ [kdevtmpfs] root 16 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [netns] root 528 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [writeback] root 530 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [kintegrityd] root 531 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [bioset] root 533 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [kblockd] root 688 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [ata_sff] root 698 0.0 0.0 0 0 ? S Dec01 0:00 \_ [khubd] root 708 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [md] root 711 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [edac-poller] root 718 0.0 0.0 0 0 ? S Dec01 0:21 \_ [kworker/0:1] root 719 0.0 0.0 0 0 ? S Dec01 0:19 \_ [kworker/1:1] root 817 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [rpciod] root 856 0.0 0.0 0 0 ? S Dec01 0:00 \_ [kswapd0] root 857 0.0 0.0 0 0 ? SN Dec01 0:00 \_ [ksmd] root 977 0.0 0.0 0 0 ? S Dec01 0:00 \_ [fsnotify_mark] root 1017 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [nfsiod] root 1030 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [cifsiod] root 1051 0.0 0.0 0 0 ? S Dec01 0:00 \_ [jfsIO] root 1052 0.0 0.0 0 0 ? S Dec01 0:00 \_ [jfsCommit] root 1053 0.0 0.0 0 0 ? S Dec01 0:00 \_ [jfsCommit] root 1054 0.0 0.0 0 0 ? S Dec01 0:00 \_ [jfsSync] root 1068 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [xfsalloc] root 1070 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [xfs_mru_cache] root 1073 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [xfslogd] root 1080 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [ocfs2_wq] root 1083 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [user_dlm] root 1098 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [bioset] root 1108 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [glock_workqueue] root 1109 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [delete_workqueu] root 1117 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [gfs_recovery] root 1122 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [crypto] root 1154 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [kthrotld] root 1832 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [bioset] root 1833 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [drbd-reissue] root 1873 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [iscsi_eh] root 1919 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [bond0] root 1961 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [bnx2x] root 1973 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [mlx4] root 1990 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [mpt_poll_0] root 1991 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [mpt/0] root 1992 0.0 0.0 0 0 ? S Dec01 0:00 \_ [scsi_eh_0] root 2014 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [vfio-irqfd-clea] root 2038 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [kpsmoused] root 2040 0.0 0.0 0 0 ? S Dec01 0:00 \_ [kworker/0:2] root 2061 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [dm_bufio_cache] root 2064 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [kdelayd] root 2067 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [kmpathd] root 2068 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [kmpath_handlerd] root 2149 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [deferwq] root 2155 0.0 0.0 0 0 ? S Dec01 0:04 \_ [kjournald] root 15417 0.0 0.0 0 0 ? S< Dec01 0:00 \_ [kworker/0:1H] root 22101 0.0 0.0 0 0 ? S Dec02 0:00 \_ [kworker/u4:0] root 21579 0.0 0.0 0 0 ? S 03:03 0:00 \_ [kworker/u4:2] root 1 0.0 0.0 10656 784 ? Ss Dec01 0:03 init [2] root 2268 0.0 0.0 21436 1380 ? Ss Dec01 0:00 udevd --daemon unscd 4117 0.0 0.0 4108 332 ? Ss Dec01 0:19 /usr/sbin/nscd root 4209 0.0 0.1 119788 2404 ? Ssl Dec01 0:01 /usr/sbin/rsyslogd root 4335 0.0 0.4 42564 9096 ? Ss Dec01 0:11 /usr/bin/perl -wT /usr/sbin/munin-node root 4373 0.0 0.2 267424 5796 ? Ss Dec01 0:07 php-fpm: master process (/etc/php5/fpm/php-fpm.conf) 1001 4379 0.0 2.4 279828 50004 ? S Dec01 1:43 \_ php-fpm: pool broadbandly 1001 4380 0.0 2.5 281480 52876 ? S Dec01 1:41 \_ php-fpm: pool broadbandly 1001 4381 0.0 2.2 279484 45848 ? S Dec01 1:44 \_ php-fpm: pool broadbandly 1001 4382 0.0 2.2 279728 46336 ? S Dec01 1:42 \_ php-fpm: pool broadbandly viktor 4383 0.0 0.2 267260 5360 ? S Dec01 0:00 \_ php-fpm: pool viktor 1001 10912 0.0 2.5 284960 51816 ? S Dec02 1:20 \_ php-fpm: pool broadbandly root 4389 0.0 0.2 92720 4476 ? Sl Dec01 2:25 /usr/sbin/vmtoolsd root 4481 0.0 1.1 151804 23656 ? Ss Dec01 0:06 /usr/sbin/apache2 -k start www-data 13197 0.0 0.1 99676 2636 ? S Dec02 0:00 \_ /usr/sbin/fcgi-pm -k start www-data 13199 0.0 0.2 151852 5128 ? S Dec02 0:00 \_ /usr/sbin/apache2 -k start www-data 16334 0.0 0.2 151852 5124 ? S Dec02 0:00 \_ /usr/sbin/apache2 -k start www-data 6747 0.0 0.2 151852 5132 ? S Dec02 0:00 \_ /usr/sbin/apache2 -k start www-data 21480 0.0 0.2 151852 5124 ? S Dec02 0:00 \_ /usr/sbin/apache2 -k start www-data 25011 0.0 0.2 151852 5124 ? S Dec02 0:00 \_ /usr/sbin/apache2 -k start www-data 25026 0.0 0.2 151852 5124 ? S Dec02 0:00 \_ /usr/sbin/apache2 -k start www-data 25030 0.0 0.2 151852 5124 ? S Dec02 0:00 \_ /usr/sbin/apache2 -k start www-data 25048 0.0 0.2 151852 5124 ? S Dec02 0:00 \_ /usr/sbin/apache2 -k start www-data 25068 0.0 0.2 151852 5124 ? S Dec02 0:00 \_ /usr/sbin/apache2 -k start www-data 25072 0.0 0.2 151852 5124 ? S Dec02 0:00 \_ /usr/sbin/apache2 -k start root 4637 0.0 0.0 6360 452 ? S Dec01 0:00 /usr/sbin/courierlogger -pid=/var/run/courier/authdaemon/pid -start /usr/lib/courier/courier-authlib/authdaemond root 4641 0.0 0.0 18856 752 ? S Dec01 0:00 \_ /usr/lib/courier/courier-authlib/authdaemond root 4688 0.0 0.0 18856 160 ? S Dec01 0:00 \_ /usr/lib/courier/courier-authlib/authdaemond root 4689 0.0 0.0 18856 160 ? S Dec01 0:00 \_ /usr/lib/courier/courier-authlib/authdaemond root 4690 0.0 0.0 18856 160 ? S Dec01 0:00 \_ /usr/lib/courier/courier-authlib/authdaemond root 4691 0.0 0.0 18856 160 ? S Dec01 0:00 \_ /usr/lib/courier/courier-authlib/authdaemond root 4692 0.0 0.0 18856 160 ? S Dec01 0:00 \_ /usr/lib/courier/courier-authlib/authdaemond daemon 4651 0.0 0.0 8304 600 ? S Dec01 0:00 /usr/sbin/couriertcpd -stderrlogger=/usr/sbin/courierlogger -stderrloggername=esmtpd-ssl -maxprocs=40 -maxperip=5 -maxperc=5 -pid=/var/run/courier/esmtpd-ssl.pid -stderrlogger=/usr/sbin/courierlogger -user=daemon -group=daemon -access=/etc/courier/smtpaccess.dat -address=0 465 /usr/bin/couriertls -server -tcpd /usr/sbin/courieresmtpd daemon 4663 0.0 0.0 6232 332 ? S Dec01 0:00 /usr/sbin/courierlogger esmtpd-ssl root 4696 0.0 0.0 13908 844 ? Ss Dec01 0:00 /usr/sbin/cron 108 4766 0.0 0.0 21400 692 ? Ss Dec01 0:00 /usr/bin/dbus-daemon --system root 4778 0.0 0.0 20416 756 ? Ss Dec01 0:07 /usr/sbin/irqbalance root 4828 0.0 0.0 43668 1180 ? Ss Dec01 0:00 /usr/sbin/sshd root 4869 0.0 0.0 9356 1500 ? S Dec01 0:00 /bin/bash /usr/bin/mysqld_safe mysql 5077 0.1 6.0 944708 122624 ? Sl Dec01 4:11 \_ /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306 root 5078 0.0 0.0 4096 628 ? S Dec01 0:00 \_ logger -t mysqld -p daemon.error root 4894 0.0 0.6 263632 14064 ? Sl Dec01 1:33 /usr/bin/python3 /usr/bin/fail2ban-server -s /var/run/fail2ban/fail2ban.sock -p /var/run/fail2ban/fail2ban.pid -b root 5455 0.0 0.0 34596 1884 ? Sl Dec01 2:44 /usr/bin/monit -c /etc/monit/monitrc root 5529 0.0 0.0 6168 672 tty1 Ss+ Dec01 0:00 /sbin/getty 38400 tty1 root 6279 0.0 0.0 20432 976 ? S Dec01 0:00 /usr/lib/courier/courier/courierd daemon 23718 0.0 0.0 20708 1068 ? S 03:29 0:00 \_ /usr/lib/courier/courier/courierd uucp 23719 0.0 0.0 6224 336 ? S 03:29 0:00 \_ ./courieruucp root 23720 0.0 0.0 4124 512 ? S 03:29 0:00 \_ ./courierlocal root 23721 0.0 0.0 4152 320 ? S 03:29 0:00 \_ ./courierfax daemon 23722 0.0 0.0 14824 596 ? S 03:29 0:00 \_ ./courieresmtp daemon 26301 0.0 0.0 14820 760 ? S 03:59 0:00 | \_ courieresmtp 0 szepe.net daemon 23723 0.0 0.0 4124 532 ? S 03:29 0:00 \_ ./courierdsn daemon 6284 0.0 0.0 4088 404 ? S Dec01 0:00 /usr/sbin/courierfilter start daemon 6290 0.0 0.0 6232 332 ? S Dec01 0:00 /usr/sbin/courierlogger courierfilter daemon 6297 0.0 0.0 8304 652 ? S Dec01 0:00 /usr/sbin/couriertcpd -stderrlogger=/usr/sbin/courierlogger -user=daemon -group=daemon -access=/etc/courier/smtpaccess.dat -maxprocs=40 -maxperc=5 -maxperip=5 -pid=/var/run/courier/esmtpd.pid smtp /usr/sbin/courieresmtpd daemon 6303 0.0 0.0 6232 332 ? S Dec01 0:00 /usr/sbin/courierlogger courieresmtpd root 26296 0.0 0.0 9272 1352 ? S 03:59 0:00 /bin/sh /etc/monit/courier-debug.sh root 26306 0.0 0.0 7036 1112 ? R 03:59 0:00 \_ ps aufxwww ------------------------- 2014-12-03 03:59:45+01:00 220 strass2.szepe.net ESMTP 221 Bye. ------------------------- 2014-12-03 03:59:46+01:00 220 strass2.szepe.net ESMTP 221 Bye. ------------------------- 2014-12-03 03:59:47+01:00 220 strass2.szepe.net ESMTP 221 Bye. ------------------------- 2014-12-03 03:59:48+01:00 [CET Dec 1 23:38:19] info : Monit HTTP server stopped [CET Dec 1 23:38:20] info : Starting Monit HTTP server at [localhost:2812] [CET Dec 1 23:38:20] info : Monit HTTP server started [CET Dec 1 23:38:20] info : 'strass2_szepe_net' Monit reloaded [CET Dec 2 00:37:01] info : Reinitializing monit daemon [CET Dec 2 00:37:01] info : Awakened by the SIGHUP signal [CET Dec 2 00:37:01] info : Reinitializing Monit - Control file '/etc/monit/monitrc' [CET Dec 2 00:37:01] info : Shutting down Monit HTTP server [CET Dec 2 00:37:02] info : Monit HTTP server stopped [CET Dec 2 00:37:02] info : Starting Monit HTTP server at [localhost:2812] [CET Dec 2 00:37:02] info : Monit HTTP server started [CET Dec 2 00:37:02] info : 'strass2_szepe_net' Monit reloaded [CET Dec 2 13:10:13] error : 'courier-mta' failed protocol test [SMTP] at INET[127.0.0.1:25] via TCP -- SMTP: error receiving data -- Resource temporarily unavailable [CET Dec 2 13:10:14] info : 'courier-mta' exec: /etc/monit/courier-debug.sh [CET Dec 2 13:12:14] info : 'courier-mta' connection succeeded to INET[127.0.0.1:25] via TCP [CET Dec 2 15:36:32] error : 'courier-mta' failed protocol test [SMTP] at INET[127.0.0.1:25] via TCP -- SMTP: error receiving data -- Resource temporarily unavailable [CET Dec 2 15:36:32] info : 'courier-mta' exec: /etc/monit/courier-debug.sh [CET Dec 2 15:38:32] info : 'courier-mta' connection succeeded to INET[127.0.0.1:25] via TCP [CET Dec 3 03:59:44] error : 'courier-mta' failed protocol test [SMTP] at INET[127.0.0.1:25] via TCP -- SMTP: error receiving data -- Resource temporarily unavailable [CET Dec 3 03:59:45] info : 'courier-mta' exec: /etc/monit/courier-debug.sh ------------------------- 2014-12-03 03:59:48+01:00 Dec 3 03:49:38 strass2 courieresmtpd: started,ip=[::ffff:127.0.0.1] Dec 3 03:50:01 strass2 /USR/SBIN/CRON[25449]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi) Dec 3 03:51:39 strass2 courieresmtpd: started,ip=[::ffff:127.0.0.1] Dec 3 03:53:39 strass2 courieresmtpd: started,ip=[::ffff:127.0.0.1] Dec 3 03:55:01 strass2 /USR/SBIN/CRON[25870]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi) Dec 3 03:55:39 strass2 courieresmtpd: started,ip=[::ffff:127.0.0.1] Dec 3 03:57:39 strass2 courieresmtpd: started,ip=[::ffff:127.0.0.1] Dec 3 03:59:44 strass2 courieresmtpd: started,ip=[::ffff:127.0.0.1] Dec 3 03:59:44 strass2 courieresmtpd: started,ip=[::ffff:127.0.0.1] Dec 3 03:59:45 strass2 courierd: newmsg,id=000000000003A045.00000000547E7CA1.000066B4: dns; strass2_szepe_net ([::ffff:127.0.0.1]) Dec 3 03:59:45 strass2 courierd: started,id=000000000003A045.00000000547E7CA1.000066B4,from=<root@strass2.szepe.net>,module=esmtp,host=szepe.net,addr=<viktor@szepe.net> Dec 3 03:59:45 strass2 courierd: Waiting. shutdown time=none, wakeup time=none, queuedelivering=1, inprogress=1 Dec 3 03:59:45 strass2 courieresmtpd: started,ip=[::ffff:127.0.0.1] Dec 3 03:59:46 strass2 courieresmtpd: started,ip=[2001:41d0:52:600::40f] Dec 3 03:59:46 strass2 courieresmtp: id=000000000003A045.00000000547E7CA1.000066B4,from=<root@strass2.szepe.net>,addr=<viktor@szepe.net>: 250 Ok. 547E7CA2.00003226 Dec 3 03:59:46 strass2 courieresmtp: id=000000000003A045.00000000547E7CA1.000066B4,from=<root@strass2.szepe.net>,addr=<viktor@szepe.net>,size=842,success: delivered: mail.szepe.net [95.140.33.67] Dec 3 03:59:46 strass2 courieresmtp: id=000000000003A045.00000000547E7CA1.000066B4,from=<root@strass2.szepe.net>,addr=<viktor@szepe.net>,size=842,status: success Dec 3 03:59:46 strass2 courierd: completed,id=000000000003A045.00000000547E7CA1.000066B4 Dec 3 03:59:46 strass2 courierd: Waiting. shutdown time=Wed Dec 3 04:29:26 2014, wakeup time=Wed Dec 3 04:29:26 2014, queuedelivering=0, inprogress=0 Dec 3 03:59:47 strass2 courieresmtpd: started,ip=[::ffff:37.187.55.1] -------------------------
-
repo owner - changed status to closed
This could be a VPS network issue. Network testing can be flaky from time to time and there are two solutions you could try in Monit. Either retry the connection like this
if failed port 25 with protocol smtp and retry 2 times then alert
or require that the test must fail over several checks before a problem is reported (recommended). Like this:
if failed port 25 with protocol smtp for 3 cycles then alert
-
reporter Oh! Thank you!!
-
reporter Hello! Monit v5.10 still fails to detect SMTP but very rarely
if failed host 127.0.0.1 port 25 type tcp protocol smtp then exec "/usr/local/sbin/courier-missing.sh"
The script is:
{ echo "-----------$(date -R)--------------" echo -e "QUIT\r"|nc web.olmunkaido.hu 25 stat /var/run/courier/esmtpd.pid echo -n "esmtpd.pid:"; cat /var/run/courier/esmtpd.pid ps uwwwp $(cat /var/run/courier/esmtpd.pid) netstat -anp | grep "couriertcpd\|courieresmtp" /etc/init.d/courier-mta restart } &>> /tmp/courier-restart
And the output:
-----------Thu, 21 Apr 2016 19:19:23 +0000-------------- 220 web.olmunkaido.hu ESMTP 221 Bye. File: `/var/run/courier/esmtpd.pid' Size: 6 Blocks: 8 IO Block: 4096 regular file Device: fh/15d Inode: 6758090 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2016-04-21 12:46:49.609887060 +0000 Modify: 2016-04-21 12:46:14.793708539 +0000 Change: 2016-04-21 12:46:14.793708539 +0000 Birth: - esmtpd.pid:21786 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND daemon 21786 0.0 0.0 8312 1656 ? S 12:46 0:00 /usr/sbin/couriertcpd -stderrlogger=/usr/sbin/courierlogger -noidentlookup -user=daemon -grou tcp 0 0 127.0.0.1:143 0.0.0.0:* LISTEN 3283/couriertcpd tcp6 0 0 :::465 :::* LISTEN 21729/couriertcpd tcp6 0 0 :::25 :::* LISTEN 21786/couriertcpd tcp6 0 0 :::993 :::* LISTEN 3281/couriertcpd tcp6 0 0 :::587 :::* LISTEN 21795/couriertcpd udp6 0 0 :::40237 :::* 16955/courieresmtp unix 2 [ ] DGRAM 7085497 15108/courieresmtp unix 2 [ ] DGRAM 7095133 16955/courieresmtp Stopping Courier SMTP MSA server: done. Stopping Courier SMTP server: done. Stopping Courier mail filter: done. Stopping Courier mail server: done. Starting Courier mail server: done. Starting Courier mail filter: done. Starting Courier SMTP server: done. Starting Courier SMTP MSA server: done.
You can see that Courier listens on port 25 and also replies: "220 web.olmunkaido.hu ESMTP"
Could you give me a hint how to debug it further?
-
reporter - changed status to open
-
repo owner - changed status to closed
As mentioned earlier, you can use tcpdump to log local network communication - it'll catch the failed test including the TCP handshake and data exchanged between monit and the MTA.
It seems that the sporadic communication issue is either caused by VPS networking or the MTA itself - without the network trace it's not possible to say which side of the connection caused the problem.
You can also suppress transient errors by waiting for e.g. 2 consecutive errors in the protocol test:
if failed host 127.0.0.1 port 25 type tcp protocol smtp for 2 cycles then alert
-
reporter For solution seekers:
It turns out Courier MTA looks up the connecting IP: 127.0.0.1 in DNS
tcpdump -i any -n -v "port 53 or port 25"
By adding
-noidentlookup -nodnslookup
to TCPDOPTS in Courier's configuration solves the monitoring problem. - Log in to comment