Sporadic infinite blocking when looking up DNS SRV records

Issue #33 closed
Markus KARG created an issue

Sometimes attempts to connect via TCP are hanging forever. This happens at about ever 25th attempt to connect or even less often, but often enough to be annoying.

So I added "paranoid" tracing to find the problematic code line, and apparently it is the DNS SRV lookup...

LOGGER.finest("Looking up SRV records");
Attributes attributes = ctx.getAttributes(query, new String[]{"SRV"});
Mar 08, 2015 1:12:38 PM TcpConnection connect(Jid) FINER: ENTRY null
Mar 08, 2015 1:12:38 PM rocks.xmpp.core.session.TcpConnection connect FINEST: Getting hostname
Mar 08, 2015 1:12:38 PM TcpConnection connectWithXmppServiceDomain(String) FINER: ENTRY jabber.de
Mar 08, 2015 1:12:38 PM rocks.xmpp.core.session.TcpConnection connectWithXmppServiceDomain FINEST: Creating JNDI context for DNS lookup
Mar 08, 2015 1:12:38 PM rocks.xmpp.core.session.TcpConnection connectWithXmppServiceDomain FINEST: Looking up SRV records

I noticed that the used default timeout is ZERO, which apparently seems to mean INFINITE, which is never a good thing... So my proposal would be to modify the default timeout to be ten seconds or something like that, so Babbler at least has any chance to notice that the DNS server is not responding. Another proposal would setting a retry count as described in http://docs.oracle.com/javase/7/docs/technotes/guides/jndi/jndi-dns.html, so Babbler is not forced to retry on its own, but could safely rely on the JNDI provider doing the retries, hence correctly fail with exception.

The proposed ideas are not tested yet and open for discussion. Anyways, what ASAP should be fixed, is the fact that "INFINITE" is the default, as that leads to hangs definitively.

Comments (4)

  1. Christian Schudt repo owner

    Strangely, I have never seen such kind of hanging, but thanks for your investigation.

    I have fixed it with e60305a66f68294ae1f2dc06f9cf16c0af834970, please try if your issue still occurs. I couldn't find any documentation for value "zero", but according to the link the default timeout is 1000ms if not set and retry count is 4 by default. So I don't think any further configuration is necessary.

  2. Log in to comment