-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get container exit code from event if we can't get it from inspect. #2940
Conversation
In case when we received a container die event but was not able to inspect the container (e.g. due to timeout), we will use the exit code from the event, so that the exit code of the container is still reported and available for customer to see from describing task.
@@ -1015,6 +1019,10 @@ func (dg *dockerGoClient) handleContainerEvents(ctx context.Context, | |||
} | |||
|
|||
metadata := dg.containerMetadata(ctx, containerID) | |||
// In case when we received a container die event but was not able to inspect the container (e.g. due to timeout), | |||
// we will use the exit code from the event, so that the exit code of the container is still reported and | |||
// available for customer to see from describing task. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if exit code is set here, do we still need the inspect later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, because we need to get the exitAt timestamp https://github.com/aws/amazon-ecs-agent/blob/master/agent/api/container/container.go#L310. this is shown in the metadata, and is only available by inspecting the container (see https://github.com/moby/moby/blob/master/daemon/monitor.go#L48, the exit code is stored in event but exitAt is not). otherwise i would have remove the inspect 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. if inspect fails, do we still show the error in task stopped reason even when container exits fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes we will show the error in the reason field. see the Testing section in pr description for an example
Summary
Currently, when we received a container die event from docker, but was not able to inspect the container (e.g. due to inspect timeout), the container's exit code is left as null, because we only get it from the inspect output, and the output is empty when we fail to inspect the container. However, the exit code is also in the container die event itself (as an attribute, ref: https://github.com/moby/moby/blob/master/daemon/monitor.go#L61, didn't trace docker history but i think it has been there for a long time), and we are logging it, but just not using it.
This change sets the exit code from the docker event in case of inspect failure, so that the exit code of the container is still reported in this case and available for customer to see when describing task.
Implementation details
Set the exit code when receiving a container die event. This is done in the docker client at the place where we handle the docker event.
Testing
added unit test. also manually tested by: added some test code to return inspect timeout for exited container https://gist.github.com/fenxiong/9afdeeeedab05177b6d425c5db3eb3b9 to repro the scenario (since the timeout isn't easy to repro), then run a simple task:
Without the fix, the container's exit code is left as null:
With the fix, the container's exit code is correctly reported in describe-tasks:
New tests cover the changes: yes
Description for the changelog
Enhancement: get container's exit code from docker event in case we receive a container die event, but fail to inspect the container. Previously the container's exit code was left as null in this case.
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.