Configure SSH client ServerAliveInterval to save build minutes when network interruptions

Issue #15164 resolved
Sebastian Cole created an issue

If using long running remote tasks, network interruptions can disconnect server and client, causing them to both wait.

Configuring ~/.ssh/ssh_config to set ServerAliveInterval will force ssh clients to disconnect and fail pipelines when the server becomes unreachable.

Comments (9)

  1. Andrew howden

    Can be worked around (seemingly) with ClientAliveInterval et. al in the server configuration before a fix is rolled out.

  2. Sebastian Cole Account Deactivated reporter

    build container ssh config show containers:

    + cat ~/.ssh/config
    IdentityFile /opt/atlassian/pipelines/agent/data/id_rsa
    ServerAliveInterval 180
    
  3. Andrew howden

    Girish Thanoo

    Is it failing due to the dropped TCP conennection? You should see the transfer stop in TCP dump on the server.

  4. Andrew howden

    In this case, destination --

     ServerAliveInterval
            Sets a timeout interval in seconds after which if no data has
            been received from the server, ssh(1) will send a message through
            the encrypted channel to request a response from the server.
    

    If you see the reset on the destination box, but not the termination of the connection in bitucket it's reasonable that ServerAliveInterval is not working as anticipated.

    To be clear, you are talking about a protocol over SSH? (probably SCP)

  5. Girish Thanoo

    I am not sure who is stopping connection(source or destination).For sure connection is successful as file is partially transferred. We are using SFTP to copy file from bit bucket to SAP linux box

  6. Andrew howden

    I am not sure who is stopping connection(source or destination)

    In our case, it was an intermediary -- the suspicion is the AWS stateful firewall associated with the VPCs. You should be able to see it with tcpdump -- you're looking for the R. If it's being dropped by an intermediary, you'll see R on both the server and the client, and if it's being dropped by one or the other you'll see it on one end.

    Now that I see it, the ServerAliveInterval may be too large. Try making it smaller -- 180 seconds is probably fine, but you can even go lower (60, for example) This means, "probe the connection every 60 seconds". Additionally, make sure that it tries several times; I think that is like SeverAliveCount.

    Cheers!

  7. Log in to comment