Fix an edge case for container ordering success condition #2404
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fix an edge case for container ordering success condition.
Currently, if a container being depended on success condition has a non-empty exit code while its status is running, the task is marked as failed. However, under sufficient load (100+ containers), when running a container that exits quickly, Docker can send container running event along with the exit code, so in that case we will store the exit code of the container while its status is still RUNNING. This causes the agent to fail the task when such container is depended on by others with success condition:
Fixing this case by not failing the success condition check unless the dependent container is both stopped and has a non-successful exit code.
Implementation details
Adjust logic in dependency graph to reflect changes described above.
Testing
Added a test case in unit test. Manually reproduced the issue with around 100 tasks that has a fast exiting container and a container depends on it with success condition, and verified that this change fixes the issue.
New tests cover the changes: yes
Description for the changelog
Fix an edge case that can cause task failed to start when using container ordering success condition.
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.