Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition in legacy process stdio code #737

Merged
merged 1 commit into from
Nov 15, 2019

Conversation

kevpar
Copy link
Member

@kevpar kevpar commented Nov 14, 2019

HcsCreateProcess returns a set of stdio handles for the newly created
process. A while ago, we used to cache these handles and return them
the first time stdio handles were requested for the process, and then
get new handles via HcsGetProcessInfo for each subsequent request. At
some point, this code was cleaned up to instead always return the
original set of handles as non-closable (for new callers) and always get
new handles via HcsGetProcessInfo (for legacy callers, who required
closable handles).

However, this change introduced a race condition for legacy callers,
where in the case of a short lived container process, the container
could have terminated between when it was started and when the
orchestrator requested stdio handles. This led to ERROR_NOT_FOUND
being returned from HcsGetProcessInfo.

This change addresses this by returning the original handles the first
time stdio handles are requested, and then calling HcsGetProcessInfo for
every subsequent request (just as it used to work a while ago).

Signed-off-by: Kevin Parsons [email protected]

@kevpar kevpar requested a review from a team November 14, 2019 21:52
internal/hcs/process.go Outdated Show resolved Hide resolved
@kevpar kevpar force-pushed the fix-legacy-stdio branch 3 times, most recently from cd42497 to b432e59 Compare November 14, 2019 22:54
internal/hcs/process.go Outdated Show resolved Hide resolved
Copy link
Member

@jstarks jstarks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with some suggestions.

HcsCreateProcess returns a set of stdio handles for the newly created
process. A while ago, we used to cache these handles and return them
the first time stdio handles were requested for the process, and then
get new handles via HcsGetProcessInfo for each subsequent request. At
some point, this code was cleaned up to instead always return the
original set of handles as non-closable (for new callers) and always get
new handles via HcsGetProcessInfo (for legacy callers, who required
closable handles).

However, this change introduced a race condition for legacy callers,
where in the case of a short lived container process, the container
could have terminated between when it was started and when the
orchestrator requested stdio handles. This led to ERROR_NOT_FOUND
being returned from HcsGetProcessInfo.

This change addresses this by returning the original handles the first
time stdio handles are requested, and then calling HcsGetProcessInfo for
every subsequent request (just as it used to work a while ago).

Signed-off-by: Kevin Parsons <[email protected]>
@kevpar kevpar merged commit 5f244ac into microsoft:master Nov 15, 2019
vikramhh pushed a commit to vikramhh/moby that referenced this pull request Nov 25, 2019
Among other things, this is required to pull in
microsoft/hcsshim#718 which should
take care of multiple issues reported to us.

Also fixes microsoft/hcsshim#737
which was caught by checks while attempting to bump
up hcsshim version.

Modifies TestRunAttachFailedNoLeak to do case-insensitive
comparison to fix a failure on RS1.

Signed-off-by: Vikram bir Singh <[email protected]>
vikramhh pushed a commit to vikramhh/moby that referenced this pull request Nov 25, 2019
Among other things, this is required to pull in
microsoft/hcsshim#718

Also fixes microsoft/hcsshim#737
which was caught by checks while attempting to bump
up hcsshim version.

Signed-off-by: Vikram bir Singh <[email protected]>
docker-jenkins pushed a commit to docker-archive/docker-ce that referenced this pull request Nov 26, 2019
Among other things, this is required to pull in
microsoft/hcsshim#718

Also fixes microsoft/hcsshim#737
which was caught by checks while attempting to bump
up hcsshim version.

Signed-off-by: Vikram bir Singh <[email protected]>
Upstream-commit: a7b6c3f0bf5d10c6227a29bac7dd46b9a7a779bc
Component: engine
thaJeztah pushed a commit to thaJeztah/docker that referenced this pull request Dec 3, 2019
Among other things, this is required to pull in
microsoft/hcsshim#718

Also fixes microsoft/hcsshim#737
which was caught by checks while attempting to bump
up hcsshim version.

Signed-off-by: Vikram bir Singh <[email protected]>
(cherry picked from commit a7b6c3f)
Signed-off-by: Sebastiaan van Stijn <[email protected]>
@Iristyle
Copy link

I'm not sure if this PR is responsible for a new error I'm seeing or not -- but something in between nightly docker 2019-09-25 and nightly docker 2019-12-04 is causing errors on container exit under LCOW with a hcsshim::System::waitBackground message like this now:

PS C:\agent\_work\13\s> docker run -it hadolint/hadolint hadolint --version
Haskell Dockerfile Linter v1.17.3-1-g3fab409
time="2019-12-16T16:45:25-08:00" level=error msg="Error waiting for container: failed to shutdown container: container 33acef4c0b8c4edc15192cf81df85189a7b09d961ca893ac0437811c2625967c encountered an error during hcsshim::System::waitBackground: failure in a Windows system call: The virtual machine or container with the specified identifier is not running. (0xc0370110): subsequent terminate failed container 33acef4c0b8c4edc15192cf81df85189a7b09d961ca893ac0437811c2625967c encountered an error during hcsshim::System::waitBackground: failure in a Windows system call: The virtual machine or container with the specified identifier is not running. (0xc0370110)"

This is the last change to roll in that modified system.go so I figured I'd start here @kevpar

docker-jenkins pushed a commit to docker-archive/docker-ce that referenced this pull request Jan 23, 2020
Among other things, this is required to pull in
microsoft/hcsshim#718

Also fixes microsoft/hcsshim#737
which was caught by checks while attempting to bump
up hcsshim version.

Signed-off-by: Vikram bir Singh <[email protected]>
(cherry picked from commit a7b6c3f0bf5d10c6227a29bac7dd46b9a7a779bc)
Signed-off-by: Sebastiaan van Stijn <[email protected]>
Upstream-commit: e2f226b5b41c958fa518f677eb213eb1462f90a8
Component: engine
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants