Performance Problems when Auto-merging

Issue #134 resolved
Ulrich Kuhnhardt [Izymes]
repo owner created an issue

The PullRequestApproved Event handler does not spawn its own thread - this may cause performance issues.

Comments (7)

  1. Alex Holtz

    2.4.5 of workzone also frequently causes our instance of Stash 3.8.0 to load spike from 1-2 to 40-70 and sometimes crash. At the time of the crash, tons of automerge events are in the logs.

    The problem did not occur for us on version 2.4.3.

  2. Ulrich Kuhnhardt [Izymes] reporter

    Hi Alex,

    Thanks for reporting this issue here. We've also received your feedback from Marketplace - I think it's the same issue.

    The only change in workzone 2.4.5 was to have the Stash 'Pull request approved' event spawn a new thread in workzone to check and perform a merge operation. This is the recommended strategy by Atlassian. Before Workzone version 2.4.5 came out the auto-merge condition check was performed within the 'pull request approved' event thread.

    I would expect the opposite behaviour : Before Workzone 2.4.5 pull request approved events would queue up because the auto-merge check would delay event thread execution - and now event processing would be much faster.

    Alex, you can safely downgrade to Workzone 2.4.4 . In fact - before you do - could you please run your stash with Workzone 2.4.5 and Workzone debugging enabled? That would be of great help! To enable Workzone debug logging please run from the terminal

    curl -u admin-user -v -X PUT -d "" -H "Content-Type: application/json" http://stash-host:7990/rest/api/latest/logs/logger/com.izymes.workzone/debug

    and monitor your instance. Once the spike occurs, create a support zip and send it to ulrich at izymes dot com . Then please downgrade workzone to version 2.4.4

    Thanks a lot for your help.

  3. Alex Holtz

    Hi Ulrich,

    We upgraded June 5th at 2pm and virtually immediately started having performance problems. I ended up downgrading to Workzone 2.4.3 (our previous version) on June 6 because the system kept coming dangerously close to crashing, even on a weekend with virtually no user activity.

    It seems like there was only just enough activity to trigger some automerges in our largest repo and each time the entire system would become unresponsive.

    Immediately after reverting to Workzone 2.4.3 the performance problems stopped.

    I'm unwilling to upgrade in production right now in order to capture the debug logs because the symptoms caused by 2.4.5 are quite severe for us.

    I will send you the logs I have though.

    Creating new threads for each auto-merge all at once flooded our system with so many simultaneous requests that the system almost crashes. Have you considered if this could be related to the known bug in Stash 3.8 - https://jira.atlassian.com/browse/STASH-7337.

  4. Ulrich Kuhnhardt [Izymes] reporter

    Hi Alex,

    thanks for the update. We totally understand that you won't be able to gather more logs with Workzone 2.4.5 due to the severity of the effects. I'm glad you were able to roll back to Workzone version 2.4.3 without complications. If you do happen to have the logs from June 5 and 6 please do send them to my personal email address for us to analyse them.

    I believe https://jira.atlassian.com/browse/STASH-7337 is not directly related to the behaviour in Workzone that you are reporting. Workzone auto-merge is triggered by one of 2 actions

    1. successful builds (if 'watch builds' enabled)
    2. user approves PR

    I imagine there weren't many users around on that particular weekend to approve PRs ... leaves the build results trigger. Do you have 'watch builds' enabled in Workzone? We've seen issues where CI servers send more than on build result for the same successful build to Stash. Bamboo can be (mis)configured like that and send spurious build results in quick succession for some plans. Workzone does not keep track of auto-merge attempts and thinks - 'there is a new successful build result, let's auto-merge!' even though it just triggered an auto-merge attempt milliseconds before based on an earlier build result for the same PR. This may clog up core Stash's merge service and may lead to the problem you are observing. The big question though is that nothing has changed in terms of build result handling between Workzone 2.4.3 and 2.4.5. The only change is that user approval events now start their own thread and attempt an auto-merge.

    So far your problem scenario with Workzone 2.4.5 does not make much sense to me yet. However I hope the logs will shed a bit more light onto the situation.

    Thanks a lot for all the help, Alex.

  5. Log in to comment