Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acquire read lock when marshalling container/task. #2299

Merged
merged 2 commits into from
Dec 3, 2019

Conversation

fenxiong
Copy link
Contributor

@fenxiong fenxiong commented Dec 2, 2019

Summary

Acquire read lock when marshalling container/task. Without acquiring the lock the agent can crash due to concurrent access.

For example, when we are modifying the fields in container/task, there can be another goroutine invoking ForceSave to save the state, which calls the marshal method, and if the marshalling doesn't acquire the read lock, the program will crash even if the other goroutine acquires the write lock before modifying the field. See Testing section for an example.

Implementation details

Wraps container and task's MarshalJSON method with read lock.

Testing

For unit test, rely on existing statemanager unit test since this PR just adds the lock.

For manual test, this isn't easily reproducible, but it can be reproduced by intentionally invoking task.PopulateSecrets many times (see this) in the code, and run a task that uses secret, and without the fix the agent will crash like the following:

...
2019-12-02T22:01:58Z [INFO] Task engine [arn:aws:ecs:us-west-2:xxx:task/test-exec/xxx]: creating container: xxx
2019-12-02T22:01:58Z [INFO] Populate secrets 10000 times!
fatal error: concurrent map iteration and map write
...

I was able to verify that the issue is fixed with this commit.

Description for the changelog

Fixed a race condition that might cause the agent to crash during container creation.

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@fenxiong fenxiong changed the title Acquire read lock when marshalling container/task. [WIP] Acquire read lock when marshalling container/task. Dec 2, 2019
@fenxiong fenxiong changed the title [WIP] Acquire read lock when marshalling container/task. Acquire read lock when marshalling container/task. Dec 2, 2019
@fenxiong fenxiong requested a review from a team December 2, 2019 23:47
Copy link
Member

@fierlion fierlion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@fenxiong fenxiong merged commit 9e40df4 into aws:dev Dec 3, 2019
@fenxiong fenxiong deleted the fix-crash branch December 3, 2019 17:10
@petderek petderek mentioned this pull request Dec 11, 2019
@sparrc sparrc added this to the 1.35.0 milestone Jan 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants