reconnectionStrategy doesn't work with WebSocket

Issue #78 closed
jjoannes created an issue

Using

        XmppSessionConfiguration xmppSessionConfiguration =
            XmppSessionConfiguration.builder()
                .reconnectionStrategy(ReconnectionStrategy
                    .truncatedBinaryExponentialBackoffStrategy(60, 5))

works well with a TCP connection but not with a WebSocket one. All what I get when shutting down the XMPP server (after connection and login) is:

juil. 26, 2016 4:21:03 PM org.glassfish.tyrus.container.jdk.client.ClientFilter processError
GRAVE: Connection error has occurred
java.io.IOException: Le nom réseau spécifié nest plus disponible.

    at sun.nio.ch.Iocp.translateErrorToIOException(Iocp.java:309)
    at sun.nio.ch.Iocp.access$700(Iocp.java:46)
    at sun.nio.ch.Iocp$EventHandlerTask.run(Iocp.java:399)
    at java.lang.Thread.run(Thread.java:745)

Whereas with TCP:

juil. 26, 2016 6:11:08 PM rocks.xmpp.core.session.ReconnectionManager scheduleReconnection
PRÉCIS: Disconnect detected. Next reconnection attempt in 5 seconds.

Comments (20)

  1. Christian Schudt repo owner

    I've tried to reproduce, but I couldn't (JDK 8u60). Reconnection worked as expected after server shutdown and restart with WebSocket.

    I've also seen this: https://java.net/jira/browse/TYRUS-399, which describes this problem. Whis operating system do you use? Are you able to debug this and/or maybe get more stacktrace?

  2. jjoannes reporter

    I can reproduce it each time: Windows 7 and JDK 8u91 (32 bits). I will try with the latest JDK (102). I'll test also on a target server (Suse, JDK 8u72 32 bits). If that works well on this one, that's fine. I don't know how I can debug further. The only traces I get are those I included in the issue description...

  3. Christian Schudt repo owner

    I've tested on another machine (Windows 10, JDK 8u92) and it works well, too. Connected to a remote machine, then shut down and restarted the server (Openfire) => reconnection started normally and successfully.

    Unfortunately, I cannot tell, what's your error.

  4. Christian Schudt repo owner

    I've also tested with a localhost and then with a remote machine. it's the same result for me.

  5. Christian Schudt repo owner

    Although I still couldn't reproduce it, I've browsed through the code and hopefully fixed it with this change: 6bfce85 (The onError method never got called in my tests).

    Are you able to compile and test this?

  6. jjoannes reporter

    I'm trying right now. May take some time as I'm not used to work with Git... I'll let you know.

  7. jjoannes reporter

    Just for your information, I've got one test failure: Results :

    Failed tests: unmarshalThumbnail(rocks.xmpp.extensions.jingle.apps.filetransfer.JingleFileTransferTest): expected obje ct to not be null

    Tests run: 524, Failures: 1, Errors: 0, Skipped: 0

  8. jjoannes reporter

    Does not work better :-(.

    Something is puzzling me anyway. In WebSocketConnection (line 269) you create a "new EndPoint" without overriding the "onClose" method. And, as far as I understand, this is this "onClose" which is called on the failure I get. Does that make sense?

  9. jjoannes reporter

    Just added a "onClose":

                    @Override
                    public void onClose(Session session, CloseReason closeReason) {
                        System.out.println("Connection closed: " + closeReason);
                    }
    

    and... it is called (Windows test)!

    Connection closed: CloseReason[1006,Closed abnormally.]
    

    Looks like dealing properly with the onClose will fix my issue?

  10. jjoannes reporter

    But I can't still understand why I have this issue and why you don't. May be it's related to how the server close (or don't) the connection during the shutdown?

  11. jjoannes reporter

    Last update. It looks like I was not waiting long enough:

    IN : <iq xmlns="jabber:client" to="activation@lh6kl662/res" id="86b2343a-26f3-4114-bc80-94aaf2e55c9f" type="result"><query xmlns="jabber:iq:roster" ver=""/></iq>
    Connection closed: CloseReason[1006,Closed abnormally.]
    juil. 28, 2016 8:04:24 PM org.glassfish.tyrus.container.jdk.client.ClientFilter processError
    GRAVE: Connection error has occurred
    java.io.IOException: Le nom réseau spécifié nest plus disponible.
    
        at sun.nio.ch.Iocp.translateErrorToIOException(Iocp.java:309)
        at sun.nio.ch.Iocp.access$700(Iocp.java:46)
        at sun.nio.ch.Iocp$EventHandlerTask.run(Iocp.java:399)
        at java.lang.Thread.run(Thread.java:745)
    
    Connection OK/NOK: rocks.xmpp.core.session.ConnectionEvent[type=DISCONNECTED, nextReconnectionAttempt=PT0S]
    juil. 28, 2016 8:19:12 PM rocks.xmpp.core.session.ReconnectionManager scheduleReconnection
    PRÉCIS: Disconnect detected. Next reconnection attempt in 55 seconds.
    Connection OK/NOK: rocks.xmpp.core.session.ConnectionEvent[type=RECONNECTION_PENDING, nextReconnectionAttempt=PT54.994S]
    

    So reconnection attempts are started but after approx 15 minutes after the failure. Not what I expected...

  12. Christian Schudt repo owner

    I tested with Openfire. It sends:

    <stream:error xmlns:stream="http://etherx.jabber.org/streams"><system-shutdown xmlns="urn:ietf:params:xml:ns:xmpp-streams"/></stream:error>
    

    Then onClose is called with NORMAL_CLOSURE code. The stream error causes an exception and eventually a reconnection.

    It seems that Tigase does not send the <system-shutdown/> error (on the XMPP layer), but kills the connection on the hard way on the WebSocket layer with "Closed abnormally" error. (If so, it's also a bug in Tigase). Will try to deal with it.

  13. Christian Schudt repo owner

    I don't have the time to setup Tigase server with WebSocket and test with it, but I've implemented the onClose method based on your comments: 6d96d5a

    I am happy to get feedback from you.

    Btw.: 15 minutes could be caused by trying to write the next XMPP Ping to the (dead) connection, causing an exception, then causing the reconnection.

  14. Christian Schudt repo owner

    Great!

    with regards to the failed test: I've never had that and the continous integration on this site (drone.io) runs the test successfully as well.

    I've reviewed the code and saw that the namespace was wrong. Nonetheless weird, that my JDK/JAXB never complained about it. Fixed it with 102f679

  15. Log in to comment