Memory leak due to "XMPP Scheduled Ping Thread" accumulation when the network is down

Issue #107 closed
Former user created an issue

Hi Christian,

Using your library (0.7.4) for load testing our architecture (starting 40 threads performing in parallel some XMMP operations) I noticed that: - while the PingManager is disabled (I use the ping at WebSocket level), i.e. "xmppSession.disableFeature(PingManager.class)" in the "SessionCreationListener.accept", I can still see "XMPP Scheduled Ping Thread" threads. - if I cut the network it looks like each connection attempt (which obviously fails) let one of these thread alive, which leads quickly to a OOM (Unable to create new native thread)

Comments (12)

  1. jjoannes

    I'm also wondering if there is not also an issue in WebSocketConnection... It looks like (just an uneducated guess) that in "connect" a ScheduledThreadPoolExecutor is created before the connection is done. And, in the "close", this instance is shutdown only if the connection "isOpen". What happens if the connection failed?

  2. Christian Schudt repo owner

    Hm... The "XMPP Scheduled Ping Thread" is a single thread, which is initialized only once in the PingManager constructor (i.e. once per session only):

    private PingManager(final XmppSession xmppSession) {
            super(xmppSession, true);
            scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(XmppUtils.createNamedThreadFactory("XMPP Scheduled Ping Thread"));
    

    I wonder how there can be more than one thread then?

    Is this issue only with websockets?

  3. Christian Schudt repo owner

    btw.: Not sure, what the issue really is, but I noticed a change in the WebSocketConnection close method in 0.7.4. Can you reproduce your issue with 0.7.3 as well?

  4. jjoannes

    For my tests I create several instances of XmppSession (calling close at the end, in a finally block). What I don't understand first is why PingManager is not really disabled...

    Due to my infrastructure restrictions I can only use WebSocket.

  5. jjoannes

    Actually it's in 0.7.3 that I noticed the issue first, I switched to 0.7.4 but the issue is still there.

  6. Christian Schudt repo owner

    PingManager is enabled by default, because I think it's a useful feature, which should be enabled by default.

    disableFeature is there to disable a feature during a session. But at this point the session is already created and therefore PingManager constructor has already been called. And the constructor creates the ScheduledExecutorService.

    I considering creating it when the feature is enabled, and shut it down, when it's disabled. This would help your use case, bu as is, it's designed a final class member, which is only created once in the constructor.

    To prevent the constructor being called, you should initialize your XmppSession with the Ping extension disabled directly:

     XmppSessionConfiguration.builder()
    .extensions(Extension.of(Ping.NAMESPACE, PingManager.class, false))
    .build();
    

    (this overrides the default, which is enabled).

    In any case, when closing the session, the ExecutorService which was created, is shutdown (and therefore terminate the thread), just debugged it.

    I am still not sure, how the OOM can happen then.

  7. jjoannes

    Did you try when there is a connection failure? The thread leak I noticed happens only in this case.

  8. Christian Schudt repo owner

    I've tried with connection failures, but couldn't truly reproduce the issue. However, I still found that PingManager might leak cancelled tasks.

    I've fixed it with 74f050e. Cancelled tasks are no longer retained in the work queue, but are immdetiately removed.

    I've uploaded version 0.7.5-SNAPSHOT to the snapshot repo. Are you able to test it (better without appling the workaround from my last comment, to know if it really was the issue)?

    The threads probably would get closed after the last (cancelled) tasks were processed, i.e. by default after 15 minutes. Of course this is undesirable nonetheless.

  9. jjoannes

    It turned out that I wasted your time :-(. I've found out that the instrumentation of my code by the test framework I use (Grinder) makes the finally block ignored... Anyway, I'll test the 0.7.5-SNAPSHOT and let you know the results.

  10. Christian Schudt repo owner

    Ok. Please let my know, if there are any other issues with regards to this issue.

    If not, I'd like to close this one.

  11. Log in to comment