Skip to content

Commit

Permalink
ExponentialBackoff (internal/retry/wait/kubernetes_apimachinery_wait.…
Browse files Browse the repository at this point in the history
…go:112)

breaks the loop when backoff.Steps == 1. This might cause the `gcrane cp`
command to face 429 errors when we have only 2 steps to GCRBackoff because
the wait period is only ~1 minute.

For some reason I saw 429 errors from GCR even after a wait of 15 minutes.
Do a couple of improvements to remediate this issue:

- Decrease Factor to 5.0 seconds and increase steps to 6.
  - This way we get more retries and the maximum wait time is 1 hour
  after which the run will fail.
- Update GCRBackoff docstring.
  • Loading branch information
klemmari1 committed Oct 24, 2023
1 parent dbcd01c commit 22abbb4
Showing 1 changed file with 6 additions and 4 deletions.
10 changes: 6 additions & 4 deletions pkg/gcrane/copy.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,16 +46,18 @@ var Keychain = authn.NewMultiKeychain(google.Keychain, authn.DefaultKeychain)
//
// On error, we will wait for:
// - 6 seconds (in case of very short term 429s from GCS), then
// - 1 minute (in case of temporary network issues), then
// - 10 minutes (to get around GCR 10 minute quotas), then fail.
// - 30 seconds (in case of very short term 429s from GCS), then
// - 2.5 minutes (in case of temporary network issues), then
// - 12.5 minutes (to get around GCR 10 minute quotas), then
// - 1 hour (in case of longer term network issues), then fail.
//
// TODO: In theory, we could keep retrying until the next day to get around the 1M limit.
func GCRBackoff() retry.Backoff {
return retry.Backoff{
Duration: 6 * time.Second,
Factor: 10.0,
Factor: 5.0,
Jitter: 0.1,
Steps: 3,
Steps: 6,
Cap: 1 * time.Hour,
}
}
Expand Down

0 comments on commit 22abbb4

Please sign in to comment.