- edited description
How to fix “com.atlassian.bamboo.repository.RepositoryException: java.net.SocketTimeoutException: connect timed out”
he issue is that I get error messages saying: "Bamboo Unable to detect changes"
and here is the chunk from the log file:
com.atlassian.bamboo.repository.RepositoryException:
com.atlassian.bamboo.repository.RepositoryException: java.net.SocketTimeoutException: connect timed out
at com.stellarity.bamboo.repository.TfsRepository.collectChanges(TfsRepository.java:404)
at com.stellarity.bamboo.repository.TfsRepository.collectChangesSinceLastBuild(TfsRepository.java:289)
at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.collectChangesWithRetry(DefaultChangeDetectionManager.java:556)
at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.lambda$createBuildRepositoryChanges$159(DefaultChangeDetectionManager.java:427)
at com.atlassian.bamboo.variable.CustomVariableContextImpl.withVariableSubstitutor(CustomVariableContextImpl.java:221)
at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.collectChangesSinceLastBuildInternal(DefaultChangeDetectionManager.java:362)
at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.collectChangesSinceRevisions(DefaultChangeDetectionManager.java:310)
at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.collectChangesSinceRevisions(DefaultChangeDetectionManager.java:195)
at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.collectChangesSinceLastBuildIfTriggered(DefaultChangeDetectionManager.java:133)
at com.atlassian.bamboo.v2.trigger.ChangeDetectionListenerAction.testIfBuildShouldStart(ChangeDetectionListenerAction.java:114)
at com.atlassian.bamboo.plan.PlanExecutionManagerImpl$3.call(PlanExecutionManagerImpl.java:510)
at com.atlassian.bamboo.plan.PlanExecutionManagerImpl$3.call(PlanExecutionManagerImpl.java:493)
at io.atlassian.util.concurrent.ManagedLocks$ManagedLockImpl.withLock(ManagedLocks.java:293)
at com.atlassian.bamboo.plan.PlanExecutionLockServiceImpl.lock(PlanExecutionLockServiceImpl.java:85)
at com.atlassian.bamboo.plan.PlanExecutionManagerImpl.doWithProcessLock(PlanExecutionManagerImpl.java:784)
at com.atlassian.bamboo.plan.PlanExecutionManagerImpl.startConditionalBuild(PlanExecutionManagerImpl.java:492)
at com.atlassian.bamboo.plan.PlanExecutionManagerImpl.start(PlanExecutionManagerImpl.java:566)
at com.atlassian.bamboo.plan.PlanExecutionManagerImpl.start(PlanExecutionManagerImpl.java:583)
at com.atlassian.bamboo.plan.DelegatingPlanExecutionManager.start(DelegatingPlanExecutionManager.java:95)
at com.atlassian.bamboo.plan.NonBlockingPlanExecutionServiceImpl$4.startPlanExecution(NonBlockingPlanExecutionServiceImpl.java:234)
at com.atlassian.bamboo.plan.NonBlockingPlanExecutionServiceImpl$4.lambda$call$253(NonBlockingPlanExecutionServiceImpl.java:220)
at com.atlassian.bamboo.util.CacheAwareness$3.call(CacheAwareness.java:159)
at com.atlassian.bamboo.util.CacheAwareness$3.call(CacheAwareness.java:155)
at com.atlassian.bamboo.util.CacheAwareness.withValuesOlderThanTimestampReloaded(CacheAwareness.java:188)
at com.atlassian.bamboo.util.CacheAwareness.withValuesOlderThanTimestampReloaded(CacheAwareness.java:154)
at com.atlassian.bamboo.util.CacheAwareness.withValuesOlderThanTimestampReloaded(CacheAwareness.java:219)
at com.atlassian.bamboo.plan.NonBlockingPlanExecutionServiceImpl$4.call(NonBlockingPlanExecutionServiceImpl.java:219)
at com.atlassian.bamboo.plan.NonBlockingPlanExecutionServiceImpl$4.call(NonBlockingPlanExecutionServiceImpl.java:202)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at com.atlassian.bamboo.utils.BambooRunnables$1.run(BambooRunnables.java:51)
at com.atlassian.bamboo.security.ImpersonationHelper.runWith(ImpersonationHelper.java:31)
at com.atlassian.bamboo.security.ImpersonationHelper.runWithSystemAuthority(ImpersonationHelper.java:20)
at com.atlassian.bamboo.security.ImpersonationHelper$1.run(ImpersonationHelper.java:52)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at com.stellarity.bamboo.repository.TfsRepository.validateUrl(TfsRepository.java:627)
at com.stellarity.bamboo.repository.TfsRepository.getTeamProjectCollection(TfsRepository.java:633)
at com.stellarity.bamboo.repository.TfsRepository.collectChanges(TfsRepository.java:322)
Comments (42)
-
reporter -
Hi Theodore,
The error says that there is no connection to the TFS sever. You need to check whether the server is accessible by the URL you have specified.
-
reporter Hi Sergey,
I already did it and it is accessible.
Any other suggestions?
-
Oh, I see you question on Atlassian Answers. Could you provide more details (when it happens, is it periodic or sporadic) ?
-
reporter It is happening constantly. I a day we will have 5-6 of these errors
-
Actually there is only one reason to get the exception - when the TFS server is not accessible (or not responding for 3 seconds).
If you want to confirm network issues
I'm thinking about a script that will ping your TFS server and write a log. Also special monitoring tools can be used. I can help you with that if you require any assistance.
If you want to suppress the errors
It can be done. I can introduce an error counter and trigger the exception only when the counter exceeds its limit.
-
Hi @Theo_Tsatsos,
Have you considered the solutions above?
-
reporter @Sergius I did a “netstat -an | findstr 8080” to both my Bamboo master and my TFS server and the result is:
TCP 0.0.0.0:8080 0.0.0.0:0 LISTENING
TCP 10.23.149.161:64771 10.23.149.162:8080 TIME_WAIT
TCP 10.23.149.161:64773 10.23.149.162:8080 TIME_WAIT
TCP 10.23.149.161:64775 10.23.149.162:8080 TIME_WAIT
TCP 10.23.149.161:64777 10.23.149.162:8080 TIME_WAIT
TCP 10.23.149.161:64779 10.23.149.162:8080 TIME_WAIT
TCP 10.23.149.161:64781 10.23.149.162:8080 TIME_WAIT
TCP 10.23.149.161:64783 10.23.149.162:8080 TIME_WAIT
TCP [::]:8080 [::]:0 LISTENING
**“TIME_WAIT” means a connection is closed (FIN packets have been sent) but we're holding the ports in reserve in case some more packets come through due to delays.
It also means we can't reuse that combination until it times out. **
Do you believe that this is something that can affect the connection between the two servers?
-
I don't think so (unless there are 64k of TIME_WAIT connections).
-
Please, choose how we are going to proceed further: https://bitbucket.org/stellaritysoftware/tfs-repository-plugin/issues/66/how-to-fix#comment-33810223
-
reporter Hi @Sergius
I would like to confirm network issues and address them.
What is the best way to do it in your opinion?
I have already created a script that I run from the Bamboo master, that pings the TFS server every 5 seconds and creates a log file in case of failure but it runs without failures until now
-
Hi @Theo_Tsatsos,
Ok. Monitor pings is the first thing to do. Also we need to monitor HTTP server health (on which TFS is running). What OS the Babmoo master runs on? Linux or Windows?
-
reporter @Sergius It's Windows
-
Ok. Here is the script using wget
@echo off :loop echo %time% wget http://w7-srg:8080/tfs/DefaultCollection --spider --timeout=2 -nv 2>&1 waitfor aaaaaa /T 1 2>nul goto loop
Save it to
monitor.cmd
, change URL to yours and placewget.exe
to the same directory. Then runmonitor > log.txt
and leave the console window open.The good log file:
21:09:19.05 Username/Password Authentication Failed. 21:09:20.17 Username/Password Authentication Failed. 21:09:21.29 Username/Password Authentication Failed.
HTTP server works and asks for authentication. It's ok.
The bad log file:
21:12:13.89 http://w7-srg:8080/tfs/DefaultCollection: Remote file does not exist -- broken link!!! 21:12:15.01 failed: Bad file descriptor.
-
reporter Done and I got: "Username/Password Authentication Failed."
How should I proceed?
-
It's ok. HTTP server works and asks for authentication. Let it run for a day (you're getting timeout exception everyday, right?).
-
reporter okok
I'm actually getting a lot of those everyday.....
-
reporter the script that pings TFS had a failure yesterday afternoon How do you want to proceed?
-
What is in the script log?
-
reporter timestamp of the issue only
set IPADDRESS=xxxxxxxxxxxxxx set INTERVAL=5 :PINGINTERVAL
ping -n 1 %ipaddress% | find "TTL=" > nul if errorlevel 1 echo %date% %time% >> failurelog.txt
timeout %INTERVAL% GOTO PINGINTERVAL
-
So, we have confirmed that this is not a TFS plugin issue, right?
As the next step I suggest to ping another network resources (another server, router, etc). Also it would be nice to ping in another direction (from TFS to Bamboo).
-
reporter no, we haven't confirmed that. I got only one ping failure amongst days of running the script, whereas I get "unable to detect changes" multiple times per day.
The interesting thing is that I do not get these errors when the ping script is running.
-
What about wget script logs?
-
reporter still getting: Username/Password Authentication Failed. as I was supposed to
-
Did you catch this error with the ping/wget scripts?
-
reporter no
This error comes from bamboo
-
I mean are there any errors in script outputs?
-
reporter the script stopped running during the weekend
-
reporter so?????
any other suggestions????
-
It appears that our cooperation in investigating network failures is not very effective. So I'll do everything from my side only. I'll implement a threshold that will silence log warnings about short network failures (while keeping them for long network failures). You won't see those warnings anymore and everyone be happy. Do you agree?
-
reporter Hi @Sergius,
I have already spoken with Atlassian (send logs and system reports) regarding this issue and I have also asked our netops team to monitor the network connectivity of the two servers. Atlassian said that this has to do with your plugin and the netops team confirmed that there is no connectivity issues between the two servers which also belong in the same subnet and there is no firewall between them.
As a result, the issue lies either in your side (plugin) or the TFS side (not so likely but I also investigate it).
Our cooperation is not very effective as the suggestions that have been made showed nothing and we are still at square one. Apart from pinging the server I received no other proposals.
My problem is not, not to see the warnings but to fix what causes them.
-
Could you run my script (https://bitbucket.org/stellaritysoftware/tfs-repository-plugin/issues/66/how-to-fix#comment-34706186) for a day and attach the log file here?
-
reporter sure
-
Was there ever a resolution to this post? We are having the same issue. Latest TFS plug-in and latest Bamboo version.
-
Hi @tdzarnes ,
Could you post exception stacktrace for your case?
-
@tdzarnes This error means that TFS server doesn't respond. Are you sure it was not down for maintenance or there were no network issues? How often do you observe it? At what time of day?
-
I checked with IT and they show 100% up time of our TFS server. On the different build plans I see "Unable to detect changes (14 Jan 2019, 10:30:03 PM, Occurrences: 3), Unable to detect changes (15 Jan 2019, 4:00:03 AM, Occurrences: 4), Unable to detect changes (14 Jan 2019, 11:30:03 PM), etc. The different build plans are showing different times when they are unable to detect changes. If it was our TFS server offline I would suspect all of the build plans to report the same time periods.
-
@tdzarnes Ok, thank you for the information. We'll release a fix next week.
-
Sergey, Thank you for your attention. I posted. It was happening before our recent upgrade as well and expect it is a socket time out but not sure how to trace it down and correct. We are currently on Bamboo 6.7.2 with your latest TFS plugin. We poll for changes with TFS and everything else works fine, just these detection errors. If we get a check-in to a code stream then we get the build as expected from our polling.
com.atlassian.bamboo.repository.RepositoryException: java.net.SocketTimeoutException: connect timed out at com.stellarity.bamboo.repository.TfsRepository.collectChanges(TfsRepository.java:409) at com.stellarity.bamboo.repository.TfsRepository.collectChangesSinceLastBuild(TfsRepository.java:288) at com.atlassian.bamboo.vcs.configuration.legacy.LegacyChangeDetector.collectChangesSinceRevision(LegacyChangeDetector.java:54) at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.collectionChangesWithRetry(DefaultChangeDetectionManager.java:580) at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.lambda$createBuildRepositoryChanges$2(DefaultChangeDetectionManager.java:479) at com.atlassian.bamboo.variable.CustomVariableContextImpl.withVariableSubstitutor(CustomVariableContextImpl.java:185) at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.collectChangesSinceLastBuildInternal(DefaultChangeDetectionManager.java:440) at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.collectChangesSinceRevisions(DefaultChangeDetectionManager.java:290) at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.collectChangesSinceRevisions(DefaultChangeDetectionManager.java:215) at com.atlassian.bamboo.v2.trigger.DefaultChangeDetectionManager.collectChangesSinceLastBuildIfTriggered(DefaultChangeDetectionManager.java:137) at com.atlassian.bamboo.v2.trigger.ChangeDetectionListenerAction.testIfBuildShouldStart(ChangeDetectionListenerAction.java:104) at com.atlassian.bamboo.plan.PlanExecutionManagerImpl$3.call(PlanExecutionManagerImpl.java:448) at com.atlassian.bamboo.plan.PlanExecutionManagerImpl$3.call(PlanExecutionManagerImpl.java:435) at io.atlassian.util.concurrent.ManagedLocks$ManagedLockImpl.withLock(ManagedLocks.java:293) at com.atlassian.bamboo.plan.PlanExecutionLockServiceImpl.lock(PlanExecutionLockServiceImpl.java:75) at com.atlassian.bamboo.plan.PlanExecutionManagerImpl.doWithProcessLock(PlanExecutionManagerImpl.java:655) at com.atlassian.bamboo.plan.PlanExecutionManagerImpl.startConditionalBuild(PlanExecutionManagerImpl.java:435) at com.atlassian.bamboo.plan.PlanExecutionManagerImpl.start(PlanExecutionManagerImpl.java:487) at com.atlassian.bamboo.plan.PlanExecutionManagerImpl.start(PlanExecutionManagerImpl.java:500) at com.atlassian.bamboo.plan.DelegatingPlanExecutionManager.start(DelegatingPlanExecutionManager.java:79) at com.atlassian.bamboo.plan.NonBlockingPlanExecutionServiceImpl$4.startPlanExecution(NonBlockingPlanExecutionServiceImpl.java:197) at com.atlassian.bamboo.plan.NonBlockingPlanExecutionServiceImpl$4.lambda$call$0(NonBlockingPlanExecutionServiceImpl.java:186) at com.atlassian.bamboo.util.CacheAwareness$3.call(CacheAwareness.java:136) at com.atlassian.bamboo.util.CacheAwareness$3.call(CacheAwareness.java:133) at com.atlassian.bamboo.util.CacheAwareness.withValuesOlderThanTimestampReloaded(CacheAwareness.java:162) at com.atlassian.bamboo.util.CacheAwareness.withValuesOlderThanTimestampReloaded(CacheAwareness.java:133) at com.atlassian.bamboo.util.CacheAwareness.withValuesOlderThanTimestampReloaded(CacheAwareness.java:187) at com.atlassian.bamboo.plan.NonBlockingPlanExecutionServiceImpl$4.call(NonBlockingPlanExecutionServiceImpl.java:185) at com.atlassian.bamboo.plan.NonBlockingPlanExecutionServiceImpl$4.call(NonBlockingPlanExecutionServiceImpl.java:170) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at com.atlassian.bamboo.utils.BambooRunnables$1.run(BambooRunnables.java:48) at com.atlassian.bamboo.security.ImpersonationHelper.runWith(ImpersonationHelper.java:26) at com.atlassian.bamboo.security.ImpersonationHelper.runWithSystemAuthority(ImpersonationHelper.java:17) at com.atlassian.bamboo.security.ImpersonationHelper$1.run(ImpersonationHelper.java:41) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.SocketTimeoutException: connect timed out at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method) at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at com.stellarity.bamboo.repository.TfsRepository.validateUrl(TfsRepository.java:632) at com.stellarity.bamboo.repository.TfsRepository.getTeamProjectCollection(TfsRepository.java:638) at com.stellarity.bamboo.repository.TfsRepository.collectChanges(TfsRepository.java:321) ... 36 more
-
We've released a new version 1.1.21 with the fix. @tdzarnes please try it. I'm closing the issue for now. Don't hesitate to reopen it if anything is working not as expected.
Details
Now error is reported to Bamboo if we face at least 3 consecutive errors during 1 hour.
-
- changed status to resolved
-
Sergey Podobry Thank you for the quick turn around. I have installed version 1.1.21 this weekend and so far we have not seen the time out errors.
- Log in to comment