-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the deadlock caused by the ImagePullDeleteLock #836
Conversation
These changes should eliminate the deadlock we encountered with #833 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
seelog.Debug("Attempting to obtain ImagePullDeleteLock for removing images") | ||
ImagePullDeleteLock.Lock() | ||
seelog.Debug("Obtained ImagePullDeleteLock for removing images") | ||
defer seelog.Debug("Released ImagePullDeleteLock after removing images") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe group these statements in an anonymous function so its easier to see the order of execution?
defer func() { ... }
// Cause a fake delay when recording container reference so that the | ||
// race condition between ImagePullLock and updateLock gets exercised | ||
// If updateLock precedes ImagePullLock, it can cause a deadlock | ||
client.EXPECT().InspectImage(sleepContainer.Image).Do(func(image string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Did we verify that this fails before the patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this test doesn't pass without the patch.
Summary
Fix a dead lock that was caused by the ImageCleanup and Image Pull. Issue #833
Implementation details
Based on the explanation here, adjust the order of lock acquiring in image cleanup, so that both go routines will require the lock in the same order, which wouldn't cause deadlock.
Testing
make release
)go build -out amazon-ecs-agent.exe ./agent
)make test
) passgo test -timeout=25s ./agent/...
) passmake run-integ-tests
) pass.\scripts\run-integ-tests.ps1
) passmake run-functional-tests
) pass.\scripts\run-functional-tests.ps1
) passNew tests cover the changes:
Description for the changelog
Fix an issue which could potentially cause agent running into deadlock.
Licensing
This contribution is under the terms of the Apache 2.0 License:
yes