Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker run --isolation=process fails to start #9278

Closed
2 tasks done
TBBle opened this issue Oct 28, 2020 · 6 comments
Closed
2 tasks done

docker run --isolation=process fails to start #9278

TBBle opened this issue Oct 28, 2020 · 6 comments

Comments

@TBBle
Copy link

TBBle commented Oct 28, 2020

  • I have tried with the latest version of my channel (both Stable and Edge)
  • I have uploaded Diagnostics
  • Diagnostics ID: 1F063BD3-1D8C-407B-9C66-5D8ABF827775/20201028034451

Expected behavior

Containers work when run in either Process or Hyper-V isolation modes

Actual behavior

Containers work when run in Hyper-V isolation mode, but fail in Process isolation mode.

The failure looks like

PS C:\Users\paulh> docker run --isolation=process -it --rm mcr.microsoft.com/windows/servercore:10.0.19041.508
Unable to find image 'mcr.microsoft.com/windows/servercore:10.0.19041.508' locally
10.0.19041.508: Pulling from windows/servercore
295f12394c4f: Already exists
fbc68affb523: Pull complete
Digest: sha256:6f4191dabd2f6d058c5baf535d099409d411ca8a4ae099d01016aac62d3d4927
Status: Downloaded newer image for mcr.microsoft.com/windows/servercore:10.0.19041.508
docker: Error response from daemon: container d7c1a68ed49b818e50a0b4dab4ac0becf55117cca64362916bd7854fbd340c03 encountered an error during hcsshim::System::Start: context deadline exceeded.
time="2020-10-28T13:03:25+11:00" level=error msg="Error waiting for container: container d7c1a68ed49b818e50a0b4dab4ac0becf55117cca64362916bd7854fbd340c03: driver \"windowsfilter\" failed to remove root filesystem: failed to detach VHD: The device is not ready.: rename C:\\ProgramData\\Docker\\windowsfilter\\d7c1a68ed49b818e50a0b4dab4ac0becf55117cca64362916bd7854fbd340c03 C:\\ProgramData\\Docker\\windowsfilter\\d7c1a68ed49b818e50a0b4dab4ac0becf55117cca64362916bd7854fbd340c03-removing: Access is denied."

I assume the 'failed to remove root filesystem' is from the --rm, so the core problem is the hcsshim error.

Information

  • Windows Version: Observed on both Windows 10 2004 (10.0.19041.508) and Windows 10 1909 (10.0.18363.1082)
  • Docker Desktop Version: 2.4.2.0 (Edge)
  • Are you running inside a virtualized Windows e.g. on a cloud server or on a mac VM: No

docker version details:

Client: Docker Engine - Community
 Cloud integration: 0.1.22
 Version:           20.10.0-beta1
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        ac365d7
 Built:             Tue Oct 13 18:13:24 2020
 OS/Arch:           windows/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.0-beta1
  API version:      1.41 (minimum version 1.24)
  Go version:       go1.13.15
  Git commit:       9c15e82
  Built:            Tue Oct 13 18:17:06 2020
  OS/Arch:          windows/amd64
  Experimental:     true

This problem newly appeared with the Edge install, so I assume it's related to the Docker Engine 20.10.0-beta1. I haven't rolled back to the Stable release to verify that though.

In the Windows 10 1909 case, I tried rebooting and clearing the Windows Containers state, and the problem still reproduced.

Steps to reproduce the behavior

  1. On a Windows 10 2004 (10.0.19041.508) host,
  2. in Windows Container mode,
  3. execute docker run --isolation=process -it --rm mcr.microsoft.com/windows/servercore:10.0.19041.508,
  4. and wait for hcsshim to time out.
  5. Observe that the container is now stuck in "Removal In Progress", and cannot be removed.
@TBBle
Copy link
Author

TBBle commented Nov 2, 2020

I just tried to reproduce this at home, using nanoserver, and it didn't reproduce, on a Windows 10 20H2 19042.572 system.

docker run --rm -it --isolation=process mcr.microsoft.com/windows/nanoserver:10.0.19042.572

> docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.4.2-docker)
  scan: Docker Scan (Docker Inc., v0.3.4)

Server:
 Containers: 1
  Running: 0
  Paused: 0
  Stopped: 1
 Images: 1
 Server Version: 20.10.0-beta1
 Storage Driver: windowsfilter (windows) lcow (linux)
  Windows:
  LCOW:
 Logging Driver: json-file
 Plugins:
  Volume: local
  Network: ics internal l2bridge l2tunnel nat null overlay private transparent
  Log: awslogs etwlogs fluentd gcplogs gelf json-file local logentries splunk syslog
 Swarm: inactive
 Default Isolation: hyperv
 Kernel Version: 10.0 19042 (19041.1.amd64fre.vb_release.191206-1406)
 Operating System: Windows 10 Pro Version 2009 (OS Build 19042.572)
 OSType: windows
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.96GiB
 Name: KEITARO
 ID: NDQJ:UAVJ:AHVY:OAXE:FYZU:4JGK:FV2M:PCET:ZEBX:AGSH:7L7H:GHCF
 Docker Root Dir: C:\ProgramData\Docker
 Debug Mode: true
  File Descriptors: -1
  Goroutines: 30
  System Time: 2020-11-03T02:48:55.1019934+11:00
  EventsListeners: 2
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

I tried with servercore, and that worked too:
docker run --rm -it --isolation=process mcr.microsoft.com/windows/servercore:10.0.19042.572

I'm wondering if there was an issue in the 508 build that was fixed in the 572 build. The "Windows 10 2004 (10.0.19041.508)" machine in the original report has been upgraded to Windows 10 2004 (10.0.19041.572), but is currently busy so I can't repeat the test there to validate this idea right now.

Of course, it could be any of the dozens of other differences between the two work-related, AD-joined, machines I replicated this on, and my home games machine. -_-

The "Windows 10 1909 (10.0.18363.1082)" machine in the original report has since been upgraded to Windows 10 2004 (10.0.19041.264) and it still reproduces the issue:
docker run -it --isolation=process mcr.microsoft.com/windows/nanoserver:10.0.19041.264.

@TBBle
Copy link
Author

TBBle commented Nov 4, 2020

It still happens on my (work) Windows 10 2004 (10.0.19041.572) box, so it's not to do with the kernel.

It's possible there's an issue relating to account privileges... Our company recently enacted a policy where privileged operations must be done by a different high-privilege user account, and that change overlaps with my installing of Docker Desktop 2.4.2.0. Edit: Tested this with a Group Policy override, it didn't make a difference.

When I have a chance, I will downgrade to Docker Desktop 2.4.0.0, and see if the problem still occurs.

I did try stopping Docker Desktop and running both dockerd.exe and docker.exe under the high-privilege account, and the problem replicated. The logs just showed that the vmcompute.dll HcsStartContainer call is timing out somewhere inside Windows. I couldn't find any useful logs for that action to suggest why it is timing out.

I do note that until rebooting, the container cannot be removed, logging an error like

Handler for DELETE /v1.41/containers/58a2cbbe657e returned error: container 58a2cbbe657e2a7efd5298475bcdaaf69e6195b27c91950c9c7075b28f3e46e5: driver \"windowsfilter\" failed to remove root filesystem: failed to detach VHD: The device is not ready.: rename C:\\ProgramData\\Docker\\windowsfilter\\58a2cbbe657e2a7efd5298475bcdaaf69e6195b27c91950c9c7075b28f3e46e5 C:\\ProgramData\\Docker\\windowsfilter\\58a2cbbe657e2a7efd5298475bcdaaf69e6195b27c91950c9c7075b28f3e46e5-removing: Access is denied.

which suggests that the container tried to start. hcsdiag lists the conatiner as 'Unknown' and hcs diag kill doesn't seem to have any effect on the container, I need to reboot to unstick it. (After reboot, Docker's status for the container goes from 'Removal in Progress' or 'Creating' (depending on if I tried to docker rm it) to 'Dead'.

@TBBle
Copy link
Author

TBBle commented Nov 5, 2020

Hmm. I've just switched my laptop (on Windows 10 10.0.19041.572) to the Stable channel, with Docker Desktop for Windows 2.5.0.0, and the problem is still ocurring. So not something to do with Docker Engine 20.10.0-beta1 after all.

Client: Docker Engine - Community
 Cloud integration: 1.0.1
 Version:           19.03.13
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        4484c46d9d
 Built:             Wed Sep 16 17:00:27 2020
 OS/Arch:           windows/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.13
  API version:      1.40 (minimum version 1.24)
  Go version:       go1.13.15
  Git commit:       4484c46d9d
  Built:            Wed Sep 16 17:14:20 2020
  OS/Arch:          windows/amd64
  Experimental:     true

@TBBle TBBle changed the title Docker 2.4.2.0 (Engine 20.10.0-beta1) Process Isolation fails to start docker run --isolation=process fails to start Nov 5, 2020
@TBBle
Copy link
Author

TBBle commented Nov 10, 2020

As an update on this: it appears to have been a conflict with Symantec Endpoint Protection, as uninstalling SEP and rebooting has fixed this on one of the above machines.

The SEP version on my other machine (where I haven't uninstalled it) is 14.3 MP1 build 1148: 14.3.1148.0100, I assume that's what was installed on the now-working machine too.

Since it affected both Windows 10 1909 and Windows 10 2004, and both Docker Engine 19.03 and 20.10 beta, I assume it's a problem specific with SEP, not a conflict between SEP and some system/Docker versions, I'll try and follow up for posterity if we work out what we can do about this, but for now I'll close the ticket as "not a Docker Desktop for Windows issue".

@TBBle TBBle closed this as completed Nov 10, 2020
@TBBle
Copy link
Author

TBBle commented Nov 12, 2020

Turns out this is a known issue with SEP, instructions on exceptions to add to SEP are at Endpoint Protection interfering with Docker containers on Windows Server 2016

@docker-robott
Copy link
Collaborator

Closed issues are locked after 30 days of inactivity.
This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle locked

@docker docker locked and limited conversation to collaborators Dec 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants