You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The symptom we are seeing is that we have many more OpsGenie alerts coming from GoCD for the edxapp pipeline nw than in the past. Additionally, we have OpsGenie configured to only alert the team if the failure happens twice, and most alerts self-close. This typically means the issue was flaky, which is often a networking issue.
We've been adding a bunch of retries recently around failures related to failed downloads, as an example.
We've had enough success with retries at the moment that we've decided not to pursue determining if there is an underlying issue. I'm closing this ticket.
Context:
The symptom we are seeing is that we have many more OpsGenie alerts coming from GoCD for the edxapp pipeline nw than in the past. Additionally, we have OpsGenie configured to only alert the team if the failure happens twice, and most alerts self-close. This typically means the issue was flaky, which is often a networking issue.
We've been adding a bunch of retries recently around failures related to failed downloads, as an example.
We have other transient failures ticketed:
And still others that have yet to be ticketed.
See GoCD alerts graph:
Question:
The text was updated successfully, but these errors were encountered: