-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modified pull image retry with 1100ms * 2^n strategy and rename Simpl… #1808
Conversation
b4fe760
to
6a07251
Compare
pullRetryDelayMultiplier = 1.5 | ||
minimumPullRetryDelay = 1100 * time.Millisecond | ||
maximumPullRetryDelay = 5 * time.Second | ||
pullRetryDelayMultiplier = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these to make the tests which depend on pulling images from ECR less flaky?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the longer interval will make sure that if user gets throttled by ECR. the next retry will jump over a throttle bucket (1s). With the current setup starting with 0.25s, the user will be throttled again because the next retry is still within the same throttle bucket.
xPullRetryDelayMultiplier = 2 | ||
xPullRetryJitterMultiplier = 0.2 | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these value match with the ones in docker_client
?
If no, how are these decided?
If yes, should we keep a constant to be reused at both the places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was deciding wether to increase unit test timeout vs use a customized unit test backoff mechanism. I decided to use different backoff values for unit test is because that if we are using same values from docker client, it will make the unit test of this file to run ~60s. I noticed that our test already takes quite long to run, using customized value for unit might be a better option.
agent/engine/docker_task_engine.go
Outdated
@@ -229,8 +230,8 @@ func (engine *DockerTaskEngine) MustInit(ctx context.Context) { | |||
defer engine.mustInitLock.Unlock() | |||
|
|||
errorOnce := sync.Once{} | |||
taskEngineConnectBackoff := utils.NewSimpleBackoff(200*time.Millisecond, 2*time.Second, 0.20, 1.5) | |||
utils.RetryWithBackoff(taskEngineConnectBackoff, func() error { | |||
taskEngineConnectBackoff := retry.NewExponentialBackoff(200*time.Millisecond, 2*time.Second, 0.20, 1.5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move the values here to a constant as well, will give a better understanding of what they are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, make sense :)
@@ -13,4 +13,4 @@ install: | |||
build_script: | |||
- go build ./agent | |||
test_script: | |||
- for /f "" %%G in ('go list github.com/aws/amazon-ecs-agent/agent/... ^| find /i /v "/vendor/"') do ( go test -race -tags unit -timeout 40s %%G & IF ERRORLEVEL == 1 EXIT 1) | |||
- for /f "" %%G in ('go list github.com/aws/amazon-ecs-agent/agent/... ^| find /i /v "/vendor/"') do ( go test -race -tags unit -timeout 40s %%G & IF ERRORLEVEL == 1 EXIT 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we increase the timeout to 60 here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to update summary after I decided to use a customized backoff to shorten runtime. I didn't go with the route to update longer unit test timeout. The summary should be updated now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few minor, looks good to me otherwise.
@@ -0,0 +1,48 @@ | |||
package retry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you need to add copy right header for the new files in this package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and update the header of the modified files to xxx-2019 i guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! updated
…eBackoffRetry to ExponentialBackoffRetry into a separate retry package
Summary
Implementation details
Testing
make release
)go build -out amazon-ecs-agent.exe ./agent
)make test
) passgo test -timeout=30s ./agent/...
) passmake run-integ-tests
) pass.\scripts\run-integ-tests.ps1
) passmake run-functional-tests
) pass.\scripts\run-functional-tests.ps1
) passNew tests cover the changes:
Description for the changelog
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.