Issue #14814 wontfix
Nattakit Sriburanapitak
created an issue

I not update webhooks but webhooks change to inactive.

Comments (20)

  1. Daniel Tao staff

    We automatically deactivate hooks that have had many consecutive failures (e.g., 4xx/5xx responses, timeouts). You can always re-activate these hooks if you've resolved whatever issue was causing the failures on the receiving end.

    If anyone is seeing webhooks deactivating that haven't had many consecutive failures, please let us know.

  2. evgeniagusakova

    Could you please provide me link to documentation which explains such kind of behavior?

    I checked documentation :

    https://confluence.atlassian.com/bitbucket/troubleshoot-webhooks-757727842.html https://confluence.atlassian.com/bitbucket/manage-webhooks-735643732.html

    and do not see any information about web hook deactivation except manual.

    It looks like something was changed on your side because in my configuration web hooks always triggers deployment scripts, and deployment took long time.

    So web hook triggers script and get timeout as reply usually. But in fact all actions need to be done on push event were done successfully.

    Now it is changed.

    It is not possible for me to do deployment fast or run it in background so it is absolutely OK for me to see timeouts in web hook logs but not acceptable if web hook is disabled because of too many timeouts.

    Also I checked logs and do not understand why web hook got net_err status.
    On my side I see (from web server log) web hook connected and triggered script. So this status is not clear for me.

    And one more, could you please explain how it is possible, why I got timeout on 5sec and did not get on 7sec ?

    Screen Shot 2017-09-23 at 6.08.59 PM.png

  3. Arnaud Benatsou

    I agree with @evgeniagusakova, some changes happened on BB since few weeks breaking things on previous working situations. An other problem is that actually (2017-09-25 17h33 UTC) BB take a very long time (around 15min) to just call a post hook script. And i'm wondering if this long time is not included in the response time :/

    And an other problem again just now while clicking on preview :

    Error There was an error rendering your preview. Cl

    Thanks!

  4. Daniel Tao staff

    @evgeniagusakova You're right to point out that we failed to update our documentation to capture the latest adjustments we've made to the configuration of our webhooks service. That's on me. Fortunately, I can tell you that in the next couple of days we'll be restoring some of our previous behavior, probably most importantly increasing the timeout from 5s back to 10s.

    As for automatic deactivation, we will still keep that in place and I will make sure that we update our docs in the near future to reflect that. In the meantime, judging from the details of your particular case I am optimistic that simply increasing the timeout from 5s to 10s on our end will resolve your issue.

  5. Phil Rittenhouse

    Unacceptable. Any number of temporary network issues between Bitbucket.org and our Bamboo server could cause timeouts. We don't want Atlassian to quietly and mysteriously break our workflow when this happens. At least send an alert to the repo owner so that they can re-activate the webhook. Work with us to fix it. In most cases the issues would resolve themselves with no action on our or your part but now you are coming in and permanently breaking things, leaving us to figure out what's wrong, scratching our heads as to why our webhooks are inactive. NOT A GOOD WAY TO TREAT YOUR PAYING CUSTOMERS!

  6. Daniel Tao staff

    All right, I'll do my best to provide context here without making excuses.

    In all honesty, automatic deactivation started out as an experiment: we found that a large volume of webhook requests were consistently failing, putting a ton of extraneous load on our webhooks service, occasionally causing real stability problems. We asked ourselves, "who is benefitting from these requests that fail 100% of the time?" The obvious answer: no one. The majority of these were clearly set up, broken, and abandoned.

    When we initially enabled automated deactivation, the benefits were immediate and significant. We saved considerable resources on a large volume of requests that otherwise would have been sent to dead servers. Furthermore, we received virtually no support tickets from customers who were negatively impacted.

    Naturally, we all high-fived and congratulated ourselves on a job well done.

    That was months ago. Recently, we adjusted the criteria for automatic deactivation to start deactivating hooks more aggressively. Clearly, we went a bit too far as for the first time we are hearing from users (thank you!) that this is causing issues.

    We can certainly adjust our settings again to make our webhooks service more permissive in allowing a greater number of successive failures before deactivating a hook. We can and should also add notifications to affected repo owners, as some of you have suggested. I am only providing this level of detail to offer some visibility into how we got to this point.

    As you may already know, "Don’t #@!% the customer" is one of Atlassian's company values, and we remind ourselves of that on the Bitbucket team regularly. This is just one of those cases where doing right by most of our users—by addressing a source of instability in a widely-used service—unfortunately resulted in some collateral damage.

    We will do the right thing and clean this up. It just might take a week or two.

  7. Sean Jeffrey Vaughan

    Thanks for sharing so much about the process.

    "We asked ourselves, 'who is benefitting from these requests that fail 100% of the time?' The obvious answer: no one. The majority of these were clearly set up, broken, and abandoned." That "obvious" answer is incorrect. While BitBucket experienced the webhook as "timing out", and that was interpreted that as "failure", on our side the webhook handler successfully handled the changes, it simply took longer than the BitBucket timeout to complete. (We're using https://github.com/acidprime/puppet-r10k/blob/master/templates/webhook.bin.erb, which at least used to be Puppet's recommended webhook handler, FYI.)

    (I certainly can understand BitBucket requiring that webhook handlers complete relatively quickly, fwiw, but perhaps you can close your end but not set the webhook config as inactive.)

  8. Arnaud Benatsou

    +1 for closing webhook on BB side instead of setting it as inactive. In the meanwhile could you stop to disable webhooks. Since your modifications we have to re enable them almost every days to make our projects workflows working.

  9. Dominique Pijnenburg

    We're having problems as well, since a few weeks. We've always trusted the webhooks to be working as they did, and if there are problems we're not immediately thinking "hey, the webhook might be deactivated". So first you're looking for other causes for problems and finally the reason is this automatic deactivation.

    I get why this is happening for webhooks that really don't function anymore, but now BitBucket clearly took it a bit too far.

  10. vvden NA

    There's probably a way to do this with bitbucket api, but it would be nice to have a UI page that displays all existing webhooks for all repositories in the "team" or at least per "project" groupings of repositories and the status of webhooks (active/inactive). Right now without digging into API to quickly fix the issue on hand I have to go through like 30 repositories and check the webhook status .

  11. Floris Kraak

    It would be extremely useful to at least have some kind of overview of registered webhooks and their status. Right now we have 3 jenkins servers (soon to be 4) using the bitbucket cloud plugin to register new hooks and right now none of the commit hooks appear to be firing and we have no idea why. Or even if the hook registration succeeded at all :/

  12. Alexandre allexandre

    Same problem here, i'm using webhooks for automatic deploiement. Of course it take more than 10 secondes, and it is now inactive. Please remove this "functionality" or allow us to change the timeout range...

  13. Log in to comment