Initialize ENI ack timer if needed upon restart #2219
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fix #2193. Previously, the ack timer of each ENI attachment is not initialized upon agent restarts. If agent is stopped and started after an ENI attachment is added to state but before it's acked, we will end up with one of these: (1) If the attachment has not expired, the agent will ack the attachment status, and will crash on NPE while attempting to stop the ack timer after the ack (which is likely what happened in #2193 (comment)); (2) If the attachment has expired, the ENI attachment is left in agent state (since the task will not start in that case, we won't be removing it during task cleanup). This commit fixes the two situations by attempting to initialize the ack timer upon restart, so that - (1) if the attachment has not expired, the ack timer will be initialized so that we won't get NPE when acking it; (2) if the attachment has expired, it gets removed from agent state.
Implementation details
This is implemented in a similar way as how we deal with uninitialized fields for task resource. Add an Initialize method on ENI attachment which initializes the ack timer, and call it in engine.synchronizeState which is called upon agent restarts.
Testing
New tests cover the changes: yes
Added unit tests. Manually tested the following situations:
Situation 1: Stop agent after ENI attachment is added to state, but before it's acked; restart agent before ENI attachment has expired.
Before fix: agent panics on nil pointer reference when attempting to ack ENI attached status; After fix: agent successfully acks ENI attached status.
Situation 2: Stop agent after ENI attachment is added to state, but before it's acked; restart agent after ENI attachment has expired.
Before fix: ENI ack timer is not started, but ENI attachment is left in agent state; After fix: ENI ack timer is not started, and ENI attachment is removed from state after agent starts.
Situation 3: Stop agent after ENI attachment is added to state and acked; restart agent.
Before fix: ENI ack timer is not started, and ENI attachment is removed during task cleanup; After fix: we already had expected behavior in this case, verified that this is maintained.
Description for the changelog
Fixed a bug where the agent might crash if it's restarted right after launching a task in awsvpc network mode #2219
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.