- edited description
Failed to connect to collector
OS: Red Hat Enterprise Linux Server release 5.11 (Tikanga)
KERNEL: Linux javadev2 2.6.18-398.el5xen #1 SMP Tue Aug 12 06:30:31 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
Monit Versions: 5.9, 5.10
M/Monit Version: 3.3
Comments: Monit 5.8.1 and earlier connect correctly on RHEL5, when upgrading to 5.9+, it will fail to connect to the collector. Ports are open and can be connected to. Agents are working fine locally, I can check summary/status, monitor/unmonitor, etc. It just won't connect to the collector when using the newer agent on RHEL5. I was trying configuration changes, the agent wasn't dying on it's own. I can confirm, the agents work correctly on RHEL6.
#!
[MST Nov 20 07:44:39] error : M/Monit: status message to https://host.domain.tld:443/collector failed
[MST Nov 20 07:44:41] error : M/Monit: error receiving data from https://host.domain.tld:443/collector -- Success
[MST Nov 20 07:44:41] error : M/Monit: event message to https://host.domain.tld:443/collector failed
[MST Nov 20 07:44:41] error : M/Monit handler failed, retry scheduled for next cycle
[MST Nov 20 07:45:34] info : Shutting down Monit HTTP server
[MST Nov 20 07:45:35] info : Monit HTTP server stopped
[MST Nov 20 07:45:35] info : M/Monit heartbeat stopped
[MST Nov 20 07:45:35] info : Monit daemon with pid [2064] killed
[MST Nov 20 07:45:35] info : 'some_host.domain.tld' Monit stopped
[MST Nov 20 07:46:05] error : M/Monit: error receiving data from https://host.domain.tld:443/collector -- Success
[MST Nov 20 07:46:05] error : M/Monit: event message to https://host.domain.tld:443/collector failed
[MST Nov 20 07:46:05] info : Starting Monit 5.10 daemon with http interface at [*:2812]
[MST Nov 20 07:46:05] info : Starting Monit HTTP server at [*:2812]
[MST Nov 20 07:46:05] info : Monit HTTP server started
[MST Nov 20 07:46:05] info : 'some_host.domain.tld' Monit started
[MST Nov 20 07:46:35] error : M/Monit: error receiving data from https://host.domain.tld:443/collector -- Success
[MST Nov 20 07:46:35] error : M/Monit: event message to https://host.domain.tld:443/collector failed
[MST Nov 20 07:46:35] info : M/Monit heartbeat started
[MST Nov 20 07:47:05] error : M/Monit: error receiving data from https://host.domain.tld:443/collector -- Success
[MST Nov 20 07:47:05] error : M/Monit: event message to https://host.domain.tld:443/collector failed
[MST Nov 20 07:47:05] error : M/Monit handler failed, retry scheduled for next cycle
[MST Nov 20 07:47:05] error : M/Monit: error receiving data from https://host.domain.tld:443/collector -- Success
[MST Nov 20 07:47:05] error : M/Monit: status message to https://host.domain.tld:443/collector failed
#!
[root@javaqa2 bin]# ldd monit
linux-vdso.so.1 => (0x00007fff2c499000)
libpam.so.0 => /lib64/libpam.so.0 (0x00000036a9000000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x000000369c200000)
libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00000036af000000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x000000369fa00000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x000000369ea00000)
libssl.so.6 => /lib64/libssl.so.6 (0x00000032fb800000)
libcrypto.so.6 => /lib64/libcrypto.so.6 (0x00000032fb400000)
libc.so.6 => /lib64/libc.so.6 (0x000000369b200000)
libdl.so.2 => /lib64/libdl.so.2 (0x000000369b600000)
libaudit.so.0 => /lib64/libaudit.so.0 (0x00000036a5400000)
/lib64/ld-linux-x86-64.so.2 (0x000000369ae00000)
libgssapi_krb5.so.2 => /usr/lib64/libgssapi_krb5.so.2 (0x00000036a7800000)
libkrb5.so.3 => /usr/lib64/libkrb5.so.3 (0x00000036a5c00000)
libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00000036a6c00000)
libk5crypto.so.3 => /usr/lib64/libk5crypto.so.3 (0x00000036a5800000)
libz.so.1 => /lib64/libz.so.1 (0x000000369be00000)
libkrb5support.so.0 => /usr/lib64/libkrb5support.so.0 (0x00000036a7c00000)
libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00000036a8000000)
libselinux.so.1 => /lib64/libselinux.so.1 (0x000000369da00000)
libsepol.so.1 => /lib64/libsepol.so.1 (0x000000369d600000)
Comments (16)
-
Account Deleted reporter -
Account Deleted reporter - edited description
-
Account Deleted reporter - edited description
-
repo owner Hello,
i'm unable to replicate this issue (using CentOS 5.11). Compiled monit 5.10 from source and also tested pre-compiled binary - no problem.
Note that M/Monit 3.3 disabled SSLv3 ... CentOS 5.x uses OpenSSL 0.9.8 which supports TLSv1, which is used by Monit when the mode is SSLAUTO (default). If you specified the SSL version in the "set mmonit" statement, make sure it doesn't specify "sslv3".
If the problem persists, pleases can you try the pre-compiled binary?: http://mmonit.com/monit/#download
-
Account Deleted reporter I'm only using the defaults for the connector settings(other than moving the port to tcp/443):
#! set mmonit https://user:password@host.domain.tld/collector # and register without credentials # Don't register credentials
I will check out specifying TLSv1, that may be the issue.
-
Account Deleted reporter Same issue with precompiled and compiled. monit shows TLSv1...
#! SSL-Session: Protocol : TLSv1 Cipher : AES256-SHA
-
repo owner Please can you send mmonit.log and error.log from M/Monit's logs directory to support@mmonit.com?
Please also take the network trace of the communication between Monit and M/Monit from one of the hosts - for example using wireshark. You'll need to import the server certificate to wireshark to decrypt the traffic, see for example here for more details about the decryption: http://support.citrix.com/article/CTX116557. We'll need the plaintext dump of the decrypted communication to see what went wrong.
-
Account Deleted reporter Can you tell me run the following and let me know what version of openssl is running on your CentOS install?
#! [root@javadev2 logs]# rpm -qa|grep openssl openssl-devel-0.9.8e-31.el5_11 openssl-0.9.8e-31.el5_11 openssl-devel-0.9.8e-31.el5_11 openssl-0.9.8e-31.el5_11
-
Account Deleted reporter I've re-enabled SSL on the monit agent and disabled SSL on the set mmonit server directive. It connects correctly now:
#! set mmonit http://user:password@host.domain.tld/collector
I've set the SSL key file as well, and wireshark won't decrypt the stream either. I've never had any luck with that.
On the mmonit server, I'm seeing the following when connecting over SSL to the collector (xxx.xxx.xxx.xxx represents the problem host):
#! 2014-11-24 10:54:41 [client 127.0.0.1] File does not exist: /usr/local/mmonit-3.3/docroot/server-status 2014-11-24 10:55:41 [client 127.0.0.1] File does not exist: /usr/local/mmonit-3.3/docroot/server-status 2014-11-24 10:56:01 [client xxx.xxx.xxx.xxx] could not read request body -- read timed out 2014-11-24 10:56:01 [client xxx.xxx.xxx.xxx] could not read request body -- read timed out 2014-11-24 10:56:41 [client 127.0.0.1] File does not exist: /usr/local/mmonit-3.3/docroot/server-status 2014-11-24 10:57:24 [client xxx.xxx.xxx.xxx] could not read request body -- read timed out 2014-11-24 10:57:32 [client xxx.xxx.xxx.xxx] HTTP 400 Bad Request Invalid query part
-
repo owner We use default openssl-0.9.8e on CentOS 5.11:
$ rpm -qa|grep openssl openssl-devel-0.9.8e-31.el5_11 openssl-0.9.8e-31.el5_11 openssl-0.9.8e-31.el5_11 openssl-devel-0.9.8e-31.el5_11 $ lsb_release -id Distributor ID: CentOS Description: CentOS release 5.11 (Final) $ ldd ./monit | grep ssl libssl.so.6 => /lib64/libssl.so.6 (0x00002b92c1a73000) $ ls -l /lib64/libssl.so.6 lrwxrwxrwx 1 root root 16 Nov 20 18:38 /lib64/libssl.so.6 -> libssl.so.0.9.8e $ ./monit -V This is Monit version 5.10
Monit (5.10) is configured to report to testing M/Monit 3.3 using SSL with default SSL-auto mode ... works fine on the testing system:
set mmonit https://monit:monit@x.x.x.x:8443/collector
Does the "xxx.xxx.xxx.xxx" from the log with "read timed out" and "HTTP 400" match the client IP of RHEL 5.x which cannot connect to collector?
We'll need the decrypted network trace as mentioned to see where the problem is - did you use the M/Monit-side server key? (the same pemfile configured with "certificate" attribute in M/Monit's conf/server.xml file).
-
repo owner Please can you yet provide more details about the connection between the Monit which fails to report to M/Monit and the machine where M/Monit is running? Is it is kind of slower network? (the root cause is most probably related to "could not read request body -- read timed out")
-
Account Deleted reporter So you have the same setup as we on CentOS, we are just on RHEL5. Yes, that "xxx.xxx.xxx.xxx" is the system that cannot connect. I'm using the server side key to decrypt, but it just doesn't work. All the traffic is still encrypted when using it. The network isn't slower and have tested all ports, I'm able to connect from all hosts reporting to m/monit. If I drop back to monit 5.8.1 everything works fine, the issue starts in monit 5.9 and also in 5.10, neither version works. Something committed between 5.8.2 and 5.9 broke something.
This issue happens on all RHEL5 systems, some more frequently than others. This does not occur at all on RHEL6 systems.
-
repo owner Hello Aaron,
we have refactored the SSL implementation in the next Monit release ... we were unable to reproduce the problem with the previous implementation nor with the new one (CentOS 5.x in our lab reports to M/Monit with no problems) - please can you check if the re-implementation fixed the problem on your RHEL5 systems?
You can get the development snapshot here: http://www.mmonit.com/tmp/monit-5.12_devel.tar.gz
To compile:
./configure make
-
Account Deleted reporter So far, on 5.12.1 I've been unable to reproduce the issue and it seems to be working as intended. If I run into the issue again, I'll post further info. Thanks :)
-
repo owner - changed status to resolved
-
repo owner - removed version
Removing version: 5.9 (automated comment)
- Log in to comment