Introduce jitter for task cleanup wait duration #2969
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Closes #2968.
Introduce jitter for task cleanup wait duration. Configurable via ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION_JITTER environment variable. The purpose of introducing this is to address the use case where a large number of tasks are stopped at the same time. Without the jitter, the cleanup for all those tasks also happen at roughly the same time, which can generate a lot of work that could impact the tasks that are running at the time of such cleanup. With the jitter, each task will be cleaned up at different time, avoiding the aforementioned impact.
Implementation details
Added a new config field for the jitter. Use it in task manager to calculate the cleanup duration with jitter.
The default value of the jitter is an empty duration, so won't change existing behavior when the new env is not specified.
Testing
Added unit test. Manually tested successfully with the following:
(1) ecs.config set with jitter:
launched 10 tasks that all stop at the same time. verified the tasks get cleaned up at different random time between 1m-3m after the tasks stopped:
(2) ecs.config set without jitter:
and try the same again. the tasks are roughly cleaned up at the same time:
New tests cover the changes: yes
Description for the changelog
Enhancement - Introduce optional jitter for task cleanup wait duration, configurable via
ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION_JITTER
environment variable. In use case where there are large number of tasks being stopped at the same time, specifying this jitter can help avoid all the task cleanup happening at the same time (the latter could add pressure to the instance and as a result affect running tasks).Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.