How to fix “com.atlassian.bamboo.repository.RepositoryException: java.net.SocketTimeoutException: connect timed out”

Issue #66 resolved
Theodore Tsatsos
created an issue

he issue is that I get error messages saying: "Bamboo Unable to detect changes"

and here is the chunk from the log file:

com.atlassian.bamboo.repository.RepositoryException:

com.atlassian.bamboo.repository.RepositoryException: java.net.SocketTimeoutException: connect timed out
    at com.stellarity.bamboo.repository.TfsRepository.collectChanges(TfsRepository.java:404)
    at com.stellarity.bamboo.repository.TfsRepository.collectChangesSinceLastBuild(TfsRepository.java:289)
    at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.collectChangesWithRetry(DefaultChangeDetectionManager.java:556)
    at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.lambda$createBuildRepositoryChanges$159(DefaultChangeDetectionManager.java:427)
    at com.atlassian.bamboo.variable.CustomVariableContextImpl.withVariableSubstitutor(CustomVariableContextImpl.java:221)
    at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.collectChangesSinceLastBuildInternal(DefaultChangeDetectionManager.java:362)
    at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.collectChangesSinceRevisions(DefaultChangeDetectionManager.java:310)
    at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.collectChangesSinceRevisions(DefaultChangeDetectionManager.java:195)
    at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.collectChangesSinceLastBuildIfTriggered(DefaultChangeDetectionManager.java:133)
    at com.atlassian.bamboo.v2.trigger.ChangeDetectionListenerAction.testIfBuildShouldStart(ChangeDetectionListenerAction.java:114)
    at com.atlassian.bamboo.plan.PlanExecutionManagerImpl$3.call(PlanExecutionManagerImpl.java:510)
    at com.atlassian.bamboo.plan.PlanExecutionManagerImpl$3.call(PlanExecutionManagerImpl.java:493)
    at io.atlassian.util.concurrent.ManagedLocks$ManagedLockImpl.withLock(ManagedLocks.java:293)
    at com.atlassian.bamboo.plan.PlanExecutionLockServiceImpl.lock(PlanExecutionLockServiceImpl.java:85)
    at com.atlassian.bamboo.plan.PlanExecutionManagerImpl.doWithProcessLock(PlanExecutionManagerImpl.java:784)
    at com.atlassian.bamboo.plan.PlanExecutionManagerImpl.startConditionalBuild(PlanExecutionManagerImpl.java:492)
    at com.atlassian.bamboo.plan.PlanExecutionManagerImpl.start(PlanExecutionManagerImpl.java:566)
    at com.atlassian.bamboo.plan.PlanExecutionManagerImpl.start(PlanExecutionManagerImpl.java:583)
    at com.atlassian.bamboo.plan.DelegatingPlanExecutionManager.start(DelegatingPlanExecutionManager.java:95)
    at com.atlassian.bamboo.plan.NonBlockingPlanExecutionServiceImpl$4.startPlanExecution(NonBlockingPlanExecutionServiceImpl.java:234)
    at com.atlassian.bamboo.plan.NonBlockingPlanExecutionServiceImpl$4.lambda$call$253(NonBlockingPlanExecutionServiceImpl.java:220)
    at com.atlassian.bamboo.util.CacheAwareness$3.call(CacheAwareness.java:159)
    at com.atlassian.bamboo.util.CacheAwareness$3.call(CacheAwareness.java:155)
    at com.atlassian.bamboo.util.CacheAwareness.withValuesOlderThanTimestampReloaded(CacheAwareness.java:188)
    at com.atlassian.bamboo.util.CacheAwareness.withValuesOlderThanTimestampReloaded(CacheAwareness.java:154)
    at com.atlassian.bamboo.util.CacheAwareness.withValuesOlderThanTimestampReloaded(CacheAwareness.java:219)
    at com.atlassian.bamboo.plan.NonBlockingPlanExecutionServiceImpl$4.call(NonBlockingPlanExecutionServiceImpl.java:219)
    at com.atlassian.bamboo.plan.NonBlockingPlanExecutionServiceImpl$4.call(NonBlockingPlanExecutionServiceImpl.java:202)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at com.atlassian.bamboo.utils.BambooRunnables$1.run(BambooRunnables.java:51)
    at com.atlassian.bamboo.security.ImpersonationHelper.runWith(ImpersonationHelper.java:31)
    at com.atlassian.bamboo.security.ImpersonationHelper.runWithSystemAuthority(ImpersonationHelper.java:20)
    at com.atlassian.bamboo.security.ImpersonationHelper$1.run(ImpersonationHelper.java:52)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketTimeoutException: connect timed out
    at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
    at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:589)
    at com.stellarity.bamboo.repository.TfsRepository.validateUrl(TfsRepository.java:627)
    at com.stellarity.bamboo.repository.TfsRepository.getTeamProjectCollection(TfsRepository.java:633)
    at com.stellarity.bamboo.repository.TfsRepository.collectChanges(TfsRepository.java:322)

Comments (42)

  1. Sergey Podobry

    Actually there is only one reason to get the exception - when the TFS server is not accessible (or not responding for 3 seconds).

    If you want to confirm network issues

    I'm thinking about a script that will ping your TFS server and write a log. Also special monitoring tools can be used. I can help you with that if you require any assistance.

    If you want to suppress the errors

    It can be done. I can introduce an error counter and trigger the exception only when the counter exceeds its limit.

  2. Theodore Tsatsos reporter

    @Sergey Podobry I did a “netstat -an | findstr 8080” to both my Bamboo master and my TFS server and the result is:

    TCP 0.0.0.0:8080 0.0.0.0:0 LISTENING

    TCP 10.23.149.161:64771 10.23.149.162:8080 TIME_WAIT

    TCP 10.23.149.161:64773 10.23.149.162:8080 TIME_WAIT

    TCP 10.23.149.161:64775 10.23.149.162:8080 TIME_WAIT

    TCP 10.23.149.161:64777 10.23.149.162:8080 TIME_WAIT

    TCP 10.23.149.161:64779 10.23.149.162:8080 TIME_WAIT

    TCP 10.23.149.161:64781 10.23.149.162:8080 TIME_WAIT

    TCP 10.23.149.161:64783 10.23.149.162:8080 TIME_WAIT

    TCP [::]:8080 [::]:0 LISTENING

    **“TIME_WAIT” means a connection is closed (FIN packets have been sent) but we're holding the ports in reserve in case some more packets come through due to delays.

    It also means we can't reuse that combination until it times out. **

    Do you believe that this is something that can affect the connection between the two servers?

  3. Theodore Tsatsos reporter

    Hi @Sergey Podobry

    I would like to confirm network issues and address them.

    What is the best way to do it in your opinion?

    I have already created a script that I run from the Bamboo master, that pings the TFS server every 5 seconds and creates a log file in case of failure but it runs without failures until now

  4. Sergey Podobry

    Ok. Here is the script using wget

    @echo off
    :loop
    echo %time%
    wget http://w7-srg:8080/tfs/DefaultCollection --spider --timeout=2 -nv 2>&1
    waitfor aaaaaa /T 1 2>nul
    goto loop
    

    Save it to monitor.cmd, change URL to yours and place wget.exe to the same directory. Then run monitor > log.txt and leave the console window open.

    The good log file:

    21:09:19.05
    Username/Password Authentication Failed.
    21:09:20.17
    Username/Password Authentication Failed.
    21:09:21.29
    Username/Password Authentication Failed.
    

    HTTP server works and asks for authentication. It's ok.

    The bad log file:

    21:12:13.89
    http://w7-srg:8080/tfs/DefaultCollection:
    Remote file does not exist -- broken link!!!
    21:12:15.01
    failed: Bad file descriptor.
    
  5. Theodore Tsatsos reporter

    timestamp of the issue only

    set IPADDRESS=xxxxxxxxxxxxxx set INTERVAL=5 :PINGINTERVAL

    ping -n 1 %ipaddress% | find "TTL=" > nul if errorlevel 1 echo %date% %time% >> failurelog.txt

    timeout %INTERVAL% GOTO PINGINTERVAL

  6. Sergey Podobry

    So, we have confirmed that this is not a TFS plugin issue, right?

    As the next step I suggest to ping another network resources (another server, router, etc). Also it would be nice to ping in another direction (from TFS to Bamboo).

  7. Theodore Tsatsos reporter

    no, we haven't confirmed that. I got only one ping failure amongst days of running the script, whereas I get "unable to detect changes" multiple times per day.

    The interesting thing is that I do not get these errors when the ping script is running.

  8. Sergey Podobry

    It appears that our cooperation in investigating network failures is not very effective. So I'll do everything from my side only. I'll implement a threshold that will silence log warnings about short network failures (while keeping them for long network failures). You won't see those warnings anymore and everyone be happy. Do you agree?

  9. Theodore Tsatsos reporter

    Hi @Sergey Podobry,

    I have already spoken with Atlassian (send logs and system reports) regarding this issue and I have also asked our netops team to monitor the network connectivity of the two servers. Atlassian said that this has to do with your plugin and the netops team confirmed that there is no connectivity issues between the two servers which also belong in the same subnet and there is no firewall between them.

    As a result, the issue lies either in your side (plugin) or the TFS side (not so likely but I also investigate it).

    Our cooperation is not very effective as the suggestions that have been made showed nothing and we are still at square one. Apart from pinging the server I received no other proposals.

    My problem is not, not to see the warnings but to fix what causes them.

  10. Todd Zarnes

    I checked with IT and they show 100% up time of our TFS server. On the different build plans I see "Unable to detect changes (14 Jan 2019, 10:30:03 PM, Occurrences: 3), Unable to detect changes (15 Jan 2019, 4:00:03 AM, Occurrences: 4), Unable to detect changes (14 Jan 2019, 11:30:03 PM), etc. The different build plans are showing different times when they are unable to detect changes. If it was our TFS server offline I would suspect all of the build plans to report the same time periods.

  11. Todd Zarnes

    Sergey, Thank you for your attention. I posted. It was happening before our recent upgrade as well and expect it is a socket time out but not sure how to trace it down and correct. We are currently on Bamboo 6.7.2 with your latest TFS plugin. We poll for changes with TFS and everything else works fine, just these detection errors. If we get a check-in to a code stream then we get the build as expected from our polling.

    com.atlassian.bamboo.repository.RepositoryException: java.net.SocketTimeoutException: connect timed out at com.stellarity.bamboo.repository.TfsRepository.collectChanges(TfsRepository.java:409) at com.stellarity.bamboo.repository.TfsRepository.collectChangesSinceLastBuild(TfsRepository.java:288) at com.atlassian.bamboo.vcs.configuration.legacy.LegacyChangeDetector.collectChangesSinceRevision(LegacyChangeDetector.java:54) at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.collectionChangesWithRetry(DefaultChangeDetectionManager.java:580) at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.lambda$createBuildRepositoryChanges$2(DefaultChangeDetectionManager.java:479) at com.atlassian.bamboo.variable.CustomVariableContextImpl.withVariableSubstitutor(CustomVariableContextImpl.java:185) at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.collectChangesSinceLastBuildInternal(DefaultChangeDetectionManager.java:440) at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.collectChangesSinceRevisions(DefaultChangeDetectionManager.java:290) at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.collectChangesSinceRevisions(DefaultChangeDetectionManager.java:215) at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.collectChangesSinceLastBuildIfTriggered(DefaultChangeDetectionManager.java:137) at com.atlassian.bamboo.v2.trigger.ChangeDetectionListenerAction.testIfBuildShouldStart(ChangeDetectionListenerAction.java:104) at com.atlassian.bamboo.plan.PlanExecutionManagerImpl$3.call(PlanExecutionManagerImpl.java:448) at com.atlassian.bamboo.plan.PlanExecutionManagerImpl$3.call(PlanExecutionManagerImpl.java:435) at io.atlassian.util.concurrent.ManagedLocks$ManagedLockImpl.withLock(ManagedLocks.java:293) at com.atlassian.bamboo.plan.PlanExecutionLockServiceImpl.lock(PlanExecutionLockServiceImpl.java:75) at com.atlassian.bamboo.plan.PlanExecutionManagerImpl.doWithProcessLock(PlanExecutionManagerImpl.java:655) at com.atlassian.bamboo.plan.PlanExecutionManagerImpl.startConditionalBuild(PlanExecutionManagerImpl.java:435) at com.atlassian.bamboo.plan.PlanExecutionManagerImpl.start(PlanExecutionManagerImpl.java:487) at com.atlassian.bamboo.plan.PlanExecutionManagerImpl.start(PlanExecutionManagerImpl.java:500) at com.atlassian.bamboo.plan.DelegatingPlanExecutionManager.start(DelegatingPlanExecutionManager.java:79) at com.atlassian.bamboo.plan.NonBlockingPlanExecutionServiceImpl$4.startPlanExecution(NonBlockingPlanExecutionServiceImpl.java:197) at com.atlassian.bamboo.plan.NonBlockingPlanExecutionServiceImpl$4.lambda$call$0(NonBlockingPlanExecutionServiceImpl.java:186) at com.atlassian.bamboo.util.CacheAwareness$3.call(CacheAwareness.java:136) at com.atlassian.bamboo.util.CacheAwareness$3.call(CacheAwareness.java:133) at com.atlassian.bamboo.util.CacheAwareness.withValuesOlderThanTimestampReloaded(CacheAwareness.java:162) at com.atlassian.bamboo.util.CacheAwareness.withValuesOlderThanTimestampReloaded(CacheAwareness.java:133) at com.atlassian.bamboo.util.CacheAwareness.withValuesOlderThanTimestampReloaded(CacheAwareness.java:187) at com.atlassian.bamboo.plan.NonBlockingPlanExecutionServiceImpl$4.call(NonBlockingPlanExecutionServiceImpl.java:185) at com.atlassian.bamboo.plan.NonBlockingPlanExecutionServiceImpl$4.call(NonBlockingPlanExecutionServiceImpl.java:170) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at com.atlassian.bamboo.utils.BambooRunnables$1.run(BambooRunnables.java:48) at com.atlassian.bamboo.security.ImpersonationHelper.runWith(ImpersonationHelper.java:26) at com.atlassian.bamboo.security.ImpersonationHelper.runWithSystemAuthority(ImpersonationHelper.java:17) at com.atlassian.bamboo.security.ImpersonationHelper$1.run(ImpersonationHelper.java:41) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.SocketTimeoutException: connect timed out at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method) at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at com.stellarity.bamboo.repository.TfsRepository.validateUrl(TfsRepository.java:632) at com.stellarity.bamboo.repository.TfsRepository.getTeamProjectCollection(TfsRepository.java:638) at com.stellarity.bamboo.repository.TfsRepository.collectChanges(TfsRepository.java:321) ... 36 more

  12. Sergey Podobry

    We've released a new version 1.1.21 with the fix. @Todd Zarnes please try it. I'm closing the issue for now. Don't hesitate to reopen it if anything is working not as expected.

    Details

    Now error is reported to Bamboo if we face at least 3 consecutive errors during 1 hour.

  13. Log in to comment