More precise error message

Issue #13763 resolved
Sean Farley created an issue

Originally reported from here:!/results/%7B4515a93e-06a9-4842-b8f5-a461c21175cf%7D

To provide a user with an actionable response, we should be clear as to the reason a job failed. In this case, it was a pipelines error of running out of disk space (hopefully those 150 minutes weren't counted!).

From the pipelines team, we need:

1) reason a job failed (our fault vs their fault)

If it is pipelines fault:

2a) let's not charge the user (!)

2b) attempt to retry since it's there is only one action a user would want: re-run the build until the reason it stopped is in user land

Comments (3)

  1. Matt Ryall

    Sean - did we add some more build minutes for the customer that hit this problems? Support engineers have access to do this, and we can document it for the team if needed.

    Normally running out of disk space is a customer-caused issue, so we would typically report this issue via a disk space error in the pipeline logs.

    In this case, there was an underlying issue with the Docker disk driver used inside our infrastructure, which we can't surface directly in customer builds. When underlying issues like this happen, you'll have to rely on us identifying and fixing issues like this promptly. Subsequent to the incident on 6 Feb, we've added better monitoring on our side to catch inode exhaustion problems like this before they affect customers in future.


  2. Log in to comment