Failed to connect to collector

Issue #117 resolved
Former user created an issue

OS: Red Hat Enterprise Linux Server release 5.11 (Tikanga)

KERNEL: Linux javadev2 2.6.18-398.el5xen #1 SMP Tue Aug 12 06:30:31 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux

Monit Versions: 5.9, 5.10

M/Monit Version: 3.3

Comments: Monit 5.8.1 and earlier connect correctly on RHEL5, when upgrading to 5.9+, it will fail to connect to the collector. Ports are open and can be connected to. Agents are working fine locally, I can check summary/status, monitor/unmonitor, etc. It just won't connect to the collector when using the newer agent on RHEL5. I was trying configuration changes, the agent wasn't dying on it's own. I can confirm, the agents work correctly on RHEL6.

#!
[MST Nov 20 07:44:39] error    : M/Monit: status message to https://host.domain.tld:443/collector failed
[MST Nov 20 07:44:41] error    : M/Monit: error receiving data from https://host.domain.tld:443/collector -- Success
[MST Nov 20 07:44:41] error    : M/Monit: event message to https://host.domain.tld:443/collector failed
[MST Nov 20 07:44:41] error    : M/Monit handler failed, retry scheduled for next cycle
[MST Nov 20 07:45:34] info     : Shutting down Monit HTTP server
[MST Nov 20 07:45:35] info     : Monit HTTP server stopped
[MST Nov 20 07:45:35] info     : M/Monit heartbeat stopped
[MST Nov 20 07:45:35] info     : Monit daemon with pid [2064] killed
[MST Nov 20 07:45:35] info     : 'some_host.domain.tld' Monit stopped
[MST Nov 20 07:46:05] error    : M/Monit: error receiving data from https://host.domain.tld:443/collector -- Success
[MST Nov 20 07:46:05] error    : M/Monit: event message to https://host.domain.tld:443/collector failed
[MST Nov 20 07:46:05] info     : Starting Monit 5.10 daemon with http interface at [*:2812]
[MST Nov 20 07:46:05] info     : Starting Monit HTTP server at [*:2812]
[MST Nov 20 07:46:05] info     : Monit HTTP server started
[MST Nov 20 07:46:05] info     : 'some_host.domain.tld' Monit started
[MST Nov 20 07:46:35] error    : M/Monit: error receiving data from https://host.domain.tld:443/collector -- Success
[MST Nov 20 07:46:35] error    : M/Monit: event message to https://host.domain.tld:443/collector failed
[MST Nov 20 07:46:35] info     : M/Monit heartbeat started
[MST Nov 20 07:47:05] error    : M/Monit: error receiving data from https://host.domain.tld:443/collector -- Success
[MST Nov 20 07:47:05] error    : M/Monit: event message to https://host.domain.tld:443/collector failed
[MST Nov 20 07:47:05] error    : M/Monit handler failed, retry scheduled for next cycle
[MST Nov 20 07:47:05] error    : M/Monit: error receiving data from https://host.domain.tld:443/collector -- Success
[MST Nov 20 07:47:05] error    : M/Monit: status message to https://host.domain.tld:443/collector failed
#!
[root@javaqa2 bin]# ldd monit
        linux-vdso.so.1 =>  (0x00007fff2c499000)
        libpam.so.0 => /lib64/libpam.so.0 (0x00000036a9000000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x000000369c200000)
        libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00000036af000000)
        libresolv.so.2 => /lib64/libresolv.so.2 (0x000000369fa00000)
        libnsl.so.1 => /lib64/libnsl.so.1 (0x000000369ea00000)
        libssl.so.6 => /lib64/libssl.so.6 (0x00000032fb800000)
        libcrypto.so.6 => /lib64/libcrypto.so.6 (0x00000032fb400000)
        libc.so.6 => /lib64/libc.so.6 (0x000000369b200000)
        libdl.so.2 => /lib64/libdl.so.2 (0x000000369b600000)
        libaudit.so.0 => /lib64/libaudit.so.0 (0x00000036a5400000)
        /lib64/ld-linux-x86-64.so.2 (0x000000369ae00000)
        libgssapi_krb5.so.2 => /usr/lib64/libgssapi_krb5.so.2 (0x00000036a7800000)
        libkrb5.so.3 => /usr/lib64/libkrb5.so.3 (0x00000036a5c00000)
        libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00000036a6c00000)
        libk5crypto.so.3 => /usr/lib64/libk5crypto.so.3 (0x00000036a5800000)
        libz.so.1 => /lib64/libz.so.1 (0x000000369be00000)
        libkrb5support.so.0 => /usr/lib64/libkrb5support.so.0 (0x00000036a7c00000)
        libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00000036a8000000)
        libselinux.so.1 => /lib64/libselinux.so.1 (0x000000369da00000)
        libsepol.so.1 => /lib64/libsepol.so.1 (0x000000369d600000)

Comments (16)

  1. Tildeslash repo owner

    Hello,

    i'm unable to replicate this issue (using CentOS 5.11). Compiled monit 5.10 from source and also tested pre-compiled binary - no problem.

    Note that M/Monit 3.3 disabled SSLv3 ... CentOS 5.x uses OpenSSL 0.9.8 which supports TLSv1, which is used by Monit when the mode is SSLAUTO (default). If you specified the SSL version in the "set mmonit" statement, make sure it doesn't specify "sslv3".

    If the problem persists, pleases can you try the pre-compiled binary?: http://mmonit.com/monit/#download

  2. Former user Account Deleted reporter

    I'm only using the defaults for the connector settings(other than moving the port to tcp/443):

    #!
    set mmonit https://user:password@host.domain.tld/collector
         # and register without credentials     # Don't register credentials
    

    I will check out specifying TLSv1, that may be the issue.

  3. Former user Account Deleted reporter

    Same issue with precompiled and compiled. monit shows TLSv1...

    #!
    SSL-Session:
        Protocol  : TLSv1
        Cipher    : AES256-SHA
    
  4. Tildeslash repo owner

    Please can you send mmonit.log and error.log from M/Monit's logs directory to support@mmonit.com?

    Please also take the network trace of the communication between Monit and M/Monit from one of the hosts - for example using wireshark. You'll need to import the server certificate to wireshark to decrypt the traffic, see for example here for more details about the decryption: http://support.citrix.com/article/CTX116557. We'll need the plaintext dump of the decrypted communication to see what went wrong.

  5. Former user Account Deleted reporter

    Can you tell me run the following and let me know what version of openssl is running on your CentOS install?

    #!
    [root@javadev2 logs]# rpm -qa|grep openssl
    openssl-devel-0.9.8e-31.el5_11
    openssl-0.9.8e-31.el5_11
    openssl-devel-0.9.8e-31.el5_11
    openssl-0.9.8e-31.el5_11
    
  6. Former user Account Deleted reporter

    I've re-enabled SSL on the monit agent and disabled SSL on the set mmonit server directive. It connects correctly now:

    #!
    set mmonit http://user:password@host.domain.tld/collector
    

    I've set the SSL key file as well, and wireshark won't decrypt the stream either. I've never had any luck with that.

    On the mmonit server, I'm seeing the following when connecting over SSL to the collector (xxx.xxx.xxx.xxx represents the problem host):

    #!
    2014-11-24 10:54:41 [client 127.0.0.1] File does not exist: /usr/local/mmonit-3.3/docroot/server-status
    2014-11-24 10:55:41 [client 127.0.0.1] File does not exist: /usr/local/mmonit-3.3/docroot/server-status
    2014-11-24 10:56:01 [client xxx.xxx.xxx.xxx] could not read request body -- read timed out
    2014-11-24 10:56:01 [client xxx.xxx.xxx.xxx] could not read request body -- read timed out
    2014-11-24 10:56:41 [client 127.0.0.1] File does not exist: /usr/local/mmonit-3.3/docroot/server-status
    2014-11-24 10:57:24 [client xxx.xxx.xxx.xxx] could not read request body -- read timed out
    2014-11-24 10:57:32 [client xxx.xxx.xxx.xxx] HTTP 400 Bad Request Invalid query part
    
  7. Tildeslash repo owner

    We use default openssl-0.9.8e on CentOS 5.11:

    $ rpm -qa|grep openssl
    openssl-devel-0.9.8e-31.el5_11
    openssl-0.9.8e-31.el5_11
    openssl-0.9.8e-31.el5_11
    openssl-devel-0.9.8e-31.el5_11
    
    $ lsb_release -id
    Distributor ID: CentOS
    Description:    CentOS release 5.11 (Final)
    
    $ ldd ./monit | grep ssl
    libssl.so.6 => /lib64/libssl.so.6 (0x00002b92c1a73000)
    
    $ ls -l  /lib64/libssl.so.6
    lrwxrwxrwx 1 root root 16 Nov 20 18:38 /lib64/libssl.so.6 -> libssl.so.0.9.8e
    
    $ ./monit -V
    This is Monit version 5.10
    

    Monit (5.10) is configured to report to testing M/Monit 3.3 using SSL with default SSL-auto mode ... works fine on the testing system:

    set mmonit https://monit:monit@x.x.x.x:8443/collector
    

    Does the "xxx.xxx.xxx.xxx" from the log with "read timed out" and "HTTP 400" match the client IP of RHEL 5.x which cannot connect to collector?

    We'll need the decrypted network trace as mentioned to see where the problem is - did you use the M/Monit-side server key? (the same pemfile configured with "certificate" attribute in M/Monit's conf/server.xml file).

  8. Tildeslash repo owner

    Please can you yet provide more details about the connection between the Monit which fails to report to M/Monit and the machine where M/Monit is running? Is it is kind of slower network? (the root cause is most probably related to "could not read request body -- read timed out")

  9. Former user Account Deleted reporter

    So you have the same setup as we on CentOS, we are just on RHEL5. Yes, that "xxx.xxx.xxx.xxx" is the system that cannot connect. I'm using the server side key to decrypt, but it just doesn't work. All the traffic is still encrypted when using it. The network isn't slower and have tested all ports, I'm able to connect from all hosts reporting to m/monit. If I drop back to monit 5.8.1 everything works fine, the issue starts in monit 5.9 and also in 5.10, neither version works. Something committed between 5.8.2 and 5.9 broke something.

    This issue happens on all RHEL5 systems, some more frequently than others. This does not occur at all on RHEL6 systems.

  10. Tildeslash repo owner

    Hello Aaron,

    we have refactored the SSL implementation in the next Monit release ... we were unable to reproduce the problem with the previous implementation nor with the new one (CentOS 5.x in our lab reports to M/Monit with no problems) - please can you check if the re-implementation fixed the problem on your RHEL5 systems?

    You can get the development snapshot here: http://www.mmonit.com/tmp/monit-5.12_devel.tar.gz

    To compile:

    ./configure
    make
    
  11. Former user Account Deleted reporter

    So far, on 5.12.1 I've been unable to reproduce the issue and it seems to be working as intended. If I run into the issue again, I'll post further info. Thanks :)

  12. Log in to comment