SMTP error

Issue #112 closed
SzV created an issue

Good morning! I get courier-mta SMTP errors on localhost. The load on that VPS is just 0.01

Description: failed protocol test [SMTP] at INET[127.0.0.1:25] via TCP -- 
SMTP: error receiving data -- Resource temporarily unavailable

Is it a closed port? Is Courier frozen? I've set timeout to 10 seconds. I've searched the C code. Please help me.

http://stackoverflow.com/a/10318268/1606933 Cloud it be about blocking/non-blocking IO?

(actually version is Debian wheezy 5.10-1)

Comments (26)

  1. Tildeslash repo owner

    Hello,

    is it possible to connect on that machine to 127.0.0.1:25 for example using telnet?:

    telnet 127.0.0.1 25
    
  2. SzV reporter

    I do not know actually :) This is the reason I've opened this issue. Monit is for monitoring.

    Basicly port 25 is open and can be connected. Please start me up how to debug it.

  3. SzV reporter

    It is OK. Now. But sometimes monit alerts.

    # telnet 127.0.0.1 25
    Trying 127.0.0.1...
    Connected to 127.0.0.1.
    Escape character is '^]'.
    220 belmanetti.hu ESMTP
    
  4. SzV reporter

    BTW I know that SMTP is OK. But sometimes monit alert. And I'd like to get the root cause of it.

  5. Tildeslash repo owner

    It seems as some transient network error (maybe some VPS bug - loopback issue). You can run the network trace (using for example tcpdump or wireshark/tshark) to see what happens when monit reports an error.

  6. Tildeslash repo owner

    Even though loopback is pseudo-interface, it is works like normal network and can be traced with tcpdump/wireshark. If it freezes it is strange - as mentioned it could be some VPS issue, tcpdump can show the problem (provided the VPS will be able to run processes at the same time and won't freeze completely).

  7. Tildeslash repo owner

    Yes, see the exec action in monit manual ... but in this case you need to catch the whole communication which triggered the error - after the tcp handshake. When you'll start tcpdump when the error occurred, it is too late unless the error will occur again in the given timeframe.

  8. SzV reporter
    • changed status to open

    Sometime courier restarts itself. Now I've cought it: Dec 1 23:29:22 strass2 courierd: SHUTDOWN: Stopping... Is there a way not to restart and send email at these self-restarts?

  9. Tildeslash repo owner

    If it is restarted by Monit, use "alert" action instead of "restart". If the restart is external, Monit has nothing to do with it.

  10. SzV reporter

    As I've mentioned courier restarts itself periodically. Is there a way to signal monit NOT to alert? Maybe unmonitor & monitor?

  11. SzV reporter

    Could you please take a look at the status of the system at courier failure. At the end: SMTP localhost/IPv4/IPv6/monit log/syslog

    2014-12-03 03:59:45+01:00
    root
    -------------------------
    2014-12-03 03:59:45+01:00
    Active Internet connections (servers and established)
    Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
    tcp        0      0 127.0.0.1:3306          0.0.0.0:*               LISTEN      5077/mysqld     
    tcp        0      0 0.0.0.0:**              0.0.0.0:*               LISTEN      4828/sshd       
    tcp        0      0 127.0.0.1:2812          0.0.0.0:*               LISTEN      5455/monit     
    tcp        0      0 127.0.0.1:36278         127.0.0.1:3306          TIME_WAIT   -               
    tcp        0      0 127.0.0.1:37931         127.0.0.1:25            TIME_WAIT   -               
    tcp        0      0 127.0.0.1:60884         127.0.0.1:3022          TIME_WAIT   -               
    tcp6       0      0 :::3022                 :::*                    LISTEN      4828/sshd       
    tcp6       0      0 :::80                   :::*                    LISTEN      4481/apache2   
    tcp6       0      0 :::465                  :::*                    LISTEN      4651/couriertcpd
    tcp6       0      0 :::4949                 :::*                    LISTEN      4335/perl       
    tcp6       0      0 :::25                   :::*                    LISTEN      6297/couriertcpd
    udp6       0      0 :::42271                :::*                                26301/courieresmtp
    -------------------------
    2014-12-03 03:59:45+01:00
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    root         2  0.0  0.0      0     0 ?        S    Dec01   0:00 [kthreadd]
    root         3  0.0  0.0      0     0 ?        S    Dec01   0:10  \_ [ksoftirqd/0]
    root         5  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [kworker/0:0H]
    root         7  0.0  0.0      0     0 ?        S    Dec01   0:00  \_ [migration/0]
    root         8  0.0  0.0      0     0 ?        S    Dec01   0:00  \_ [rcu_bh]
    root         9  0.0  0.0      0     0 ?        S    Dec01   1:12  \_ [rcu_sched]
    root        10  0.0  0.0      0     0 ?        S    Dec01   0:00  \_ [migration/1]
    root        11  0.0  0.0      0     0 ?        S    Dec01   0:07  \_ [ksoftirqd/1]
    root        12  0.0  0.0      0     0 ?        S    Dec01   0:00  \_ [kworker/1:0]
    root        13  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [kworker/1:0H]
    root        14  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [khelper]
    root        15  0.0  0.0      0     0 ?        S    Dec01   0:00  \_ [kdevtmpfs]
    root        16  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [netns]
    root       528  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [writeback]
    root       530  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [kintegrityd]
    root       531  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [bioset]
    root       533  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [kblockd]
    root       688  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [ata_sff]
    root       698  0.0  0.0      0     0 ?        S    Dec01   0:00  \_ [khubd]
    root       708  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [md]
    root       711  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [edac-poller]
    root       718  0.0  0.0      0     0 ?        S    Dec01   0:21  \_ [kworker/0:1]
    root       719  0.0  0.0      0     0 ?        S    Dec01   0:19  \_ [kworker/1:1]
    root       817  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [rpciod]
    root       856  0.0  0.0      0     0 ?        S    Dec01   0:00  \_ [kswapd0]
    root       857  0.0  0.0      0     0 ?        SN   Dec01   0:00  \_ [ksmd]
    root       977  0.0  0.0      0     0 ?        S    Dec01   0:00  \_ [fsnotify_mark]
    root      1017  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [nfsiod]
    root      1030  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [cifsiod]
    root      1051  0.0  0.0      0     0 ?        S    Dec01   0:00  \_ [jfsIO]
    root      1052  0.0  0.0      0     0 ?        S    Dec01   0:00  \_ [jfsCommit]
    root      1053  0.0  0.0      0     0 ?        S    Dec01   0:00  \_ [jfsCommit]
    root      1054  0.0  0.0      0     0 ?        S    Dec01   0:00  \_ [jfsSync]
    root      1068  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [xfsalloc]
    root      1070  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [xfs_mru_cache]
    root      1073  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [xfslogd]
    root      1080  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [ocfs2_wq]
    root      1083  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [user_dlm]
    root      1098  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [bioset]
    root      1108  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [glock_workqueue]
    root      1109  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [delete_workqueu]
    root      1117  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [gfs_recovery]
    root      1122  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [crypto]
    root      1154  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [kthrotld]
    root      1832  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [bioset]
    root      1833  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [drbd-reissue]
    root      1873  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [iscsi_eh]
    root      1919  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [bond0]
    root      1961  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [bnx2x]
    root      1973  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [mlx4]
    root      1990  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [mpt_poll_0]
    root      1991  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [mpt/0]
    root      1992  0.0  0.0      0     0 ?        S    Dec01   0:00  \_ [scsi_eh_0]
    root      2014  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [vfio-irqfd-clea]
    root      2038  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [kpsmoused]
    root      2040  0.0  0.0      0     0 ?        S    Dec01   0:00  \_ [kworker/0:2]
    root      2061  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [dm_bufio_cache]
    root      2064  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [kdelayd]
    root      2067  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [kmpathd]
    root      2068  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [kmpath_handlerd]
    root      2149  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [deferwq]
    root      2155  0.0  0.0      0     0 ?        S    Dec01   0:04  \_ [kjournald]
    root     15417  0.0  0.0      0     0 ?        S<   Dec01   0:00  \_ [kworker/0:1H]
    root     22101  0.0  0.0      0     0 ?        S    Dec02   0:00  \_ [kworker/u4:0]
    root     21579  0.0  0.0      0     0 ?        S    03:03   0:00  \_ [kworker/u4:2]
    root         1  0.0  0.0  10656   784 ?        Ss   Dec01   0:03 init [2] 
    root      2268  0.0  0.0  21436  1380 ?        Ss   Dec01   0:00 udevd --daemon
    unscd     4117  0.0  0.0   4108   332 ?        Ss   Dec01   0:19 /usr/sbin/nscd
    root      4209  0.0  0.1 119788  2404 ?        Ssl  Dec01   0:01 /usr/sbin/rsyslogd
    root      4335  0.0  0.4  42564  9096 ?        Ss   Dec01   0:11 /usr/bin/perl -wT /usr/sbin/munin-node
    root      4373  0.0  0.2 267424  5796 ?        Ss   Dec01   0:07 php-fpm: master process (/etc/php5/fpm/php-fpm.conf)                 
    1001      4379  0.0  2.4 279828 50004 ?        S    Dec01   1:43  \_ php-fpm: pool broadbandly                                             
    1001      4380  0.0  2.5 281480 52876 ?        S    Dec01   1:41  \_ php-fpm: pool broadbandly                                             
    1001      4381  0.0  2.2 279484 45848 ?        S    Dec01   1:44  \_ php-fpm: pool broadbandly                                             
    1001      4382  0.0  2.2 279728 46336 ?        S    Dec01   1:42  \_ php-fpm: pool broadbandly                                             
    viktor    4383  0.0  0.2 267260  5360 ?        S    Dec01   0:00  \_ php-fpm: pool viktor                                                 
    1001     10912  0.0  2.5 284960 51816 ?        S    Dec02   1:20  \_ php-fpm: pool broadbandly                                             
    root      4389  0.0  0.2  92720  4476 ?        Sl   Dec01   2:25 /usr/sbin/vmtoolsd
    root      4481  0.0  1.1 151804 23656 ?        Ss   Dec01   0:06 /usr/sbin/apache2 -k start
    www-data 13197  0.0  0.1  99676  2636 ?        S    Dec02   0:00  \_ /usr/sbin/fcgi-pm -k start
    www-data 13199  0.0  0.2 151852  5128 ?        S    Dec02   0:00  \_ /usr/sbin/apache2 -k start
    www-data 16334  0.0  0.2 151852  5124 ?        S    Dec02   0:00  \_ /usr/sbin/apache2 -k start
    www-data  6747  0.0  0.2 151852  5132 ?        S    Dec02   0:00  \_ /usr/sbin/apache2 -k start
    www-data 21480  0.0  0.2 151852  5124 ?        S    Dec02   0:00  \_ /usr/sbin/apache2 -k start
    www-data 25011  0.0  0.2 151852  5124 ?        S    Dec02   0:00  \_ /usr/sbin/apache2 -k start
    www-data 25026  0.0  0.2 151852  5124 ?        S    Dec02   0:00  \_ /usr/sbin/apache2 -k start
    www-data 25030  0.0  0.2 151852  5124 ?        S    Dec02   0:00  \_ /usr/sbin/apache2 -k start
    www-data 25048  0.0  0.2 151852  5124 ?        S    Dec02   0:00  \_ /usr/sbin/apache2 -k start
    www-data 25068  0.0  0.2 151852  5124 ?        S    Dec02   0:00  \_ /usr/sbin/apache2 -k start
    www-data 25072  0.0  0.2 151852  5124 ?        S    Dec02   0:00  \_ /usr/sbin/apache2 -k start
    root      4637  0.0  0.0   6360   452 ?        S    Dec01   0:00 /usr/sbin/courierlogger -pid=/var/run/courier/authdaemon/pid -start /usr/lib/courier/courier-authlib/authdaemond
    root      4641  0.0  0.0  18856   752 ?        S    Dec01   0:00  \_ /usr/lib/courier/courier-authlib/authdaemond
    root      4688  0.0  0.0  18856   160 ?        S    Dec01   0:00      \_ /usr/lib/courier/courier-authlib/authdaemond
    root      4689  0.0  0.0  18856   160 ?        S    Dec01   0:00      \_ /usr/lib/courier/courier-authlib/authdaemond
    root      4690  0.0  0.0  18856   160 ?        S    Dec01   0:00      \_ /usr/lib/courier/courier-authlib/authdaemond
    root      4691  0.0  0.0  18856   160 ?        S    Dec01   0:00      \_ /usr/lib/courier/courier-authlib/authdaemond
    root      4692  0.0  0.0  18856   160 ?        S    Dec01   0:00      \_ /usr/lib/courier/courier-authlib/authdaemond
    daemon    4651  0.0  0.0   8304   600 ?        S    Dec01   0:00 /usr/sbin/couriertcpd -stderrlogger=/usr/sbin/courierlogger -stderrloggername=esmtpd-ssl -maxprocs=40 -maxperip=5 -maxperc=5 -pid=/var/run/courier/esmtpd-ssl.pid -stderrlogger=/usr/sbin/courierlogger -user=daemon -group=daemon -access=/etc/courier/smtpaccess.dat -address=0 465 /usr/bin/couriertls -server -tcpd /usr/sbin/courieresmtpd
    daemon    4663  0.0  0.0   6232   332 ?        S    Dec01   0:00 /usr/sbin/courierlogger esmtpd-ssl
    root      4696  0.0  0.0  13908   844 ?        Ss   Dec01   0:00 /usr/sbin/cron
    108       4766  0.0  0.0  21400   692 ?        Ss   Dec01   0:00 /usr/bin/dbus-daemon --system
    root      4778  0.0  0.0  20416   756 ?        Ss   Dec01   0:07 /usr/sbin/irqbalance
    root      4828  0.0  0.0  43668  1180 ?        Ss   Dec01   0:00 /usr/sbin/sshd
    root      4869  0.0  0.0   9356  1500 ?        S    Dec01   0:00 /bin/bash /usr/bin/mysqld_safe
    mysql     5077  0.1  6.0 944708 122624 ?       Sl   Dec01   4:11  \_ /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306
    root      5078  0.0  0.0   4096   628 ?        S    Dec01   0:00  \_ logger -t mysqld -p daemon.error
    root      4894  0.0  0.6 263632 14064 ?        Sl   Dec01   1:33 /usr/bin/python3 /usr/bin/fail2ban-server -s /var/run/fail2ban/fail2ban.sock -p /var/run/fail2ban/fail2ban.pid -b
    root      5455  0.0  0.0  34596  1884 ?        Sl   Dec01   2:44 /usr/bin/monit -c /etc/monit/monitrc
    root      5529  0.0  0.0   6168   672 tty1     Ss+  Dec01   0:00 /sbin/getty 38400 tty1
    root      6279  0.0  0.0  20432   976 ?        S    Dec01   0:00 /usr/lib/courier/courier/courierd
    daemon   23718  0.0  0.0  20708  1068 ?        S    03:29   0:00  \_ /usr/lib/courier/courier/courierd
    uucp     23719  0.0  0.0   6224   336 ?        S    03:29   0:00      \_ ./courieruucp
    root     23720  0.0  0.0   4124   512 ?        S    03:29   0:00      \_ ./courierlocal
    root     23721  0.0  0.0   4152   320 ?        S    03:29   0:00      \_ ./courierfax
    daemon   23722  0.0  0.0  14824   596 ?        S    03:29   0:00      \_ ./courieresmtp
    daemon   26301  0.0  0.0  14820   760 ?        S    03:59   0:00      |   \_ courieresmtp 0 szepe.net
    daemon   23723  0.0  0.0   4124   532 ?        S    03:29   0:00      \_ ./courierdsn
    daemon    6284  0.0  0.0   4088   404 ?        S    Dec01   0:00 /usr/sbin/courierfilter start
    daemon    6290  0.0  0.0   6232   332 ?        S    Dec01   0:00 /usr/sbin/courierlogger courierfilter
    daemon    6297  0.0  0.0   8304   652 ?        S    Dec01   0:00 /usr/sbin/couriertcpd -stderrlogger=/usr/sbin/courierlogger -user=daemon -group=daemon -access=/etc/courier/smtpaccess.dat -maxprocs=40 -maxperc=5 -maxperip=5 -pid=/var/run/courier/esmtpd.pid smtp /usr/sbin/courieresmtpd
    daemon    6303  0.0  0.0   6232   332 ?        S    Dec01   0:00 /usr/sbin/courierlogger courieresmtpd
    root     26296  0.0  0.0   9272  1352 ?        S    03:59   0:00 /bin/sh /etc/monit/courier-debug.sh
    root     26306  0.0  0.0   7036  1112 ?        R    03:59   0:00  \_ ps aufxwww
    -------------------------
    2014-12-03 03:59:45+01:00
    220 strass2.szepe.net ESMTP
    221 Bye.
    -------------------------
    2014-12-03 03:59:46+01:00
    220 strass2.szepe.net ESMTP
    221 Bye.
    -------------------------
    2014-12-03 03:59:47+01:00
    220 strass2.szepe.net ESMTP
    221 Bye.
    -------------------------
    2014-12-03 03:59:48+01:00
    [CET Dec  1 23:38:19] info     : Monit HTTP server stopped
    [CET Dec  1 23:38:20] info     : Starting Monit HTTP server at [localhost:2812]
    [CET Dec  1 23:38:20] info     : Monit HTTP server started
    [CET Dec  1 23:38:20] info     : 'strass2_szepe_net' Monit reloaded
    [CET Dec  2 00:37:01] info     : Reinitializing monit daemon
    [CET Dec  2 00:37:01] info     : Awakened by the SIGHUP signal
    [CET Dec  2 00:37:01] info     : Reinitializing Monit - Control file '/etc/monit/monitrc'
    [CET Dec  2 00:37:01] info     : Shutting down Monit HTTP server
    [CET Dec  2 00:37:02] info     : Monit HTTP server stopped
    [CET Dec  2 00:37:02] info     : Starting Monit HTTP server at [localhost:2812]
    [CET Dec  2 00:37:02] info     : Monit HTTP server started
    [CET Dec  2 00:37:02] info     : 'strass2_szepe_net' Monit reloaded
    [CET Dec  2 13:10:13] error    : 'courier-mta' failed protocol test [SMTP] at INET[127.0.0.1:25] via TCP -- SMTP: error receiving data -- Resource temporarily unavailable
    [CET Dec  2 13:10:14] info     : 'courier-mta' exec: /etc/monit/courier-debug.sh
    [CET Dec  2 13:12:14] info     : 'courier-mta' connection succeeded to INET[127.0.0.1:25] via TCP
    [CET Dec  2 15:36:32] error    : 'courier-mta' failed protocol test [SMTP] at INET[127.0.0.1:25] via TCP -- SMTP: error receiving data -- Resource temporarily unavailable
    [CET Dec  2 15:36:32] info     : 'courier-mta' exec: /etc/monit/courier-debug.sh
    [CET Dec  2 15:38:32] info     : 'courier-mta' connection succeeded to INET[127.0.0.1:25] via TCP
    [CET Dec  3 03:59:44] error    : 'courier-mta' failed protocol test [SMTP] at INET[127.0.0.1:25] via TCP -- SMTP: error receiving data -- Resource temporarily unavailable
    [CET Dec  3 03:59:45] info     : 'courier-mta' exec: /etc/monit/courier-debug.sh
    -------------------------
    2014-12-03 03:59:48+01:00
    Dec  3 03:49:38 strass2 courieresmtpd: started,ip=[::ffff:127.0.0.1]
    Dec  3 03:50:01 strass2 /USR/SBIN/CRON[25449]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
    Dec  3 03:51:39 strass2 courieresmtpd: started,ip=[::ffff:127.0.0.1]
    Dec  3 03:53:39 strass2 courieresmtpd: started,ip=[::ffff:127.0.0.1]
    Dec  3 03:55:01 strass2 /USR/SBIN/CRON[25870]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
    Dec  3 03:55:39 strass2 courieresmtpd: started,ip=[::ffff:127.0.0.1]
    Dec  3 03:57:39 strass2 courieresmtpd: started,ip=[::ffff:127.0.0.1]
    Dec  3 03:59:44 strass2 courieresmtpd: started,ip=[::ffff:127.0.0.1]
    Dec  3 03:59:44 strass2 courieresmtpd: started,ip=[::ffff:127.0.0.1]
    Dec  3 03:59:45 strass2 courierd: newmsg,id=000000000003A045.00000000547E7CA1.000066B4: dns; strass2_szepe_net ([::ffff:127.0.0.1])
    Dec  3 03:59:45 strass2 courierd: started,id=000000000003A045.00000000547E7CA1.000066B4,from=<root@strass2.szepe.net>,module=esmtp,host=szepe.net,addr=<viktor@szepe.net>
    Dec  3 03:59:45 strass2 courierd: Waiting.  shutdown time=none, wakeup time=none, queuedelivering=1, inprogress=1
    Dec  3 03:59:45 strass2 courieresmtpd: started,ip=[::ffff:127.0.0.1]
    Dec  3 03:59:46 strass2 courieresmtpd: started,ip=[2001:41d0:52:600::40f]
    Dec  3 03:59:46 strass2 courieresmtp: id=000000000003A045.00000000547E7CA1.000066B4,from=<root@strass2.szepe.net>,addr=<viktor@szepe.net>: 250 Ok. 547E7CA2.00003226
    Dec  3 03:59:46 strass2 courieresmtp: id=000000000003A045.00000000547E7CA1.000066B4,from=<root@strass2.szepe.net>,addr=<viktor@szepe.net>,size=842,success: delivered: mail.szepe.net [95.140.33.67]
    Dec  3 03:59:46 strass2 courieresmtp: id=000000000003A045.00000000547E7CA1.000066B4,from=<root@strass2.szepe.net>,addr=<viktor@szepe.net>,size=842,status: success
    Dec  3 03:59:46 strass2 courierd: completed,id=000000000003A045.00000000547E7CA1.000066B4
    Dec  3 03:59:46 strass2 courierd: Waiting.  shutdown time=Wed Dec  3 04:29:26 2014, wakeup time=Wed Dec  3 04:29:26 2014, queuedelivering=0, inprogress=0
    Dec  3 03:59:47 strass2 courieresmtpd: started,ip=[::ffff:37.187.55.1]
    -------------------------
    
  12. Tildeslash repo owner

    This could be a VPS network issue. Network testing can be flaky from time to time and there are two solutions you could try in Monit. Either retry the connection like this

    if failed 
        port 25 with protocol smtp and retry 2 times
    then alert
    

    or require that the test must fail over several checks before a problem is reported (recommended). Like this:

    if failed 
        port 25 with protocol smtp for 3 cycles
    then alert
    
  13. SzV reporter

    Hello! Monit v5.10 still fails to detect SMTP but very rarely

    if failed host 127.0.0.1 port 25 type tcp protocol smtp then exec "/usr/local/sbin/courier-missing.sh"

    The script is:

    {
        echo "-----------$(date -R)--------------"
    
        echo -e "QUIT\r"|nc web.olmunkaido.hu 25
    
        stat /var/run/courier/esmtpd.pid
    
        echo -n "esmtpd.pid:"; cat /var/run/courier/esmtpd.pid
    
        ps uwwwp $(cat /var/run/courier/esmtpd.pid)
    
        netstat -anp | grep "couriertcpd\|courieresmtp"
    
        /etc/init.d/courier-mta restart
    } &>> /tmp/courier-restart
    

    And the output:

    -----------Thu, 21 Apr 2016 19:19:23 +0000--------------
    
    220 web.olmunkaido.hu ESMTP
    221 Bye.
    
      File: `/var/run/courier/esmtpd.pid'
      Size: 6           Blocks: 8          IO Block: 4096   regular file
    Device: fh/15d  Inode: 6758090     Links: 1
    Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
    Access: 2016-04-21 12:46:49.609887060 +0000
    Modify: 2016-04-21 12:46:14.793708539 +0000
    Change: 2016-04-21 12:46:14.793708539 +0000
     Birth: -
    
    esmtpd.pid:21786
    
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    daemon   21786  0.0  0.0   8312  1656 ?        S    12:46   0:00 /usr/sbin/couriertcpd -stderrlogger=/usr/sbin/courierlogger -noidentlookup -user=daemon -grou
    
    tcp        0      0 127.0.0.1:143           0.0.0.0:*               LISTEN      3283/couriertcpd
    tcp6       0      0 :::465                  :::*                    LISTEN      21729/couriertcpd
    tcp6       0      0 :::25                   :::*                    LISTEN      21786/couriertcpd
    tcp6       0      0 :::993                  :::*                    LISTEN      3281/couriertcpd
    tcp6       0      0 :::587                  :::*                    LISTEN      21795/couriertcpd
    udp6       0      0 :::40237                :::*                                16955/courieresmtp
    unix  2      [ ]         DGRAM                    7085497  15108/courieresmtp
    unix  2      [ ]         DGRAM                    7095133  16955/courieresmtp
    Stopping Courier SMTP MSA server: done.
    Stopping Courier SMTP server: done.
    Stopping Courier mail filter: done.
    Stopping Courier mail server: done.
    Starting Courier mail server: done.
    Starting Courier mail filter: done.
    Starting Courier SMTP server: done.
    Starting Courier SMTP MSA server: done.
    

    You can see that Courier listens on port 25 and also replies: "220 web.olmunkaido.hu ESMTP"

    Could you give me a hint how to debug it further?

  14. Tildeslash repo owner

    As mentioned earlier, you can use tcpdump to log local network communication - it'll catch the failed test including the TCP handshake and data exchanged between monit and the MTA.

    It seems that the sporadic communication issue is either caused by VPS networking or the MTA itself - without the network trace it's not possible to say which side of the connection caused the problem.

    You can also suppress transient errors by waiting for e.g. 2 consecutive errors in the protocol test:

    if failed host 127.0.0.1 port 25 type tcp protocol smtp for 2 cycles then alert
    
  15. SzV reporter

    For solution seekers:

    It turns out Courier MTA looks up the connecting IP: 127.0.0.1 in DNS

    tcpdump -i any -n -v "port 53 or port 25"

    By adding -noidentlookup -nodnslookup to TCPDOPTS in Courier's configuration solves the monitoring problem.

  16. Log in to comment