Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hcsshim::PrepareLayer failed in Win32: This operation returned because the timeout period expired. (0x5b4) #27588

Closed
rayterrill opened this issue Oct 20, 2016 · 46 comments
Labels
area/builder kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. platform/windows version/1.12

Comments

@rayterrill
Copy link

rayterrill commented Oct 20, 2016

Description

Repeatedly receiving "hcsshim::PrepareLayer failed in Win32: This operation returned because the timeout period expired. (0x5b4)" when trying to build images using "docker build" on Server 2016 (docker client/server version 1.12.2-cs2-ws-beta). Rerunning the build (sometimes multiple times) eventually allows the build to complete, but this is annoying. Note that this does not happen every time, but it does occur frequently.

Steps to reproduce the issue:

  1. Build a dockerfile to make a custom image from microsoft/windowsservercore
  2. docker build -t mymyachine .

Describe the results you received:

hcsshim::PrepareLayer failed in Win32: This operation returned because the timeout period expired. (0x5b4) layerId=34268
b729aba561b87438591920c78363a72ed061a53c33da51bb810849df367 flavour=1

Describe the results you expected:

Build should complete succesfully without errors.

Additional information you deem important (e.g. issue happens only occasionally):

Issue happens frequently, but not every time. I have been able to successfully build images by running the build over and over again until it succeeds.

Output of docker version:

PS C:\wip\jenkins> docker version
Client:
 Version:      1.12.2-cs2-ws-beta
 API version:  1.25
 Go version:   go1.7.1
 Git commit:   050b611
 Built:        Tue Oct 11 02:35:40 2016
 OS/Arch:      windows/amd64

Server:
 Version:      1.12.2-cs2-ws-beta
 API version:  1.25
 Go version:   go1.7.1
 Git commit:   050b611
 Built:        Tue Oct 11 02:35:40 2016
 OS/Arch:      windows/amd64

Output of docker info:

Containers: 22
 Running: 0
 Paused: 0
 Stopped: 22
Images: 53
Server Version: 1.12.2-cs2-ws-beta
Storage Driver: windowsfilter
 Windows:
Logging Driver: json-file
Plugins:
 Volume: local
 Network: nat null overlay
Swarm: inactive
Default Isolation: process
Kernel Version: 10.0 14393 (14393.321.amd64fre.rs1_release_inmarket.161004-2338)
Operating System: Windows Server 2016 Standard
OSType: windows
Architecture: x86_64
CPUs: 2
Total Memory: 6 GiB
Name: POPDOCKER1
ID: KS2Q:ORLC:HZR3:UAI5:YMOB:NYND:JMM6:CSBG:E5YO:GPJ5:4E42:4GWU
Docker Root Dir: C:\ProgramData\docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

VMWare VM running Server2016 GA on-prem.

https://raw.githubusercontent.com/Microsoft/Virtualization-Documentation/master/windows-server-container-tools/Debug-ContainerHost/Debug-ContainerHost.ps1 Output

Checking for common problems
Describing Windows Version and Prerequisites
 [+] Is Windows 10 Anniversary Update or Windows Server 2016 1.38s
 [+] Has KB3192366, KB3194496, or later installed if running Windows build 14393 231ms
 [+] Is not a build with blocking issues 42ms
Describing Docker is installed
 [+] A Docker service is installed - 'Docker' or 'com.Docker.Service'  143ms
 [+] Service is running 36ms
 [+] Docker.exe is in path 2.11s
Describing User has permissions to use Docker daemon
 [+] docker.exe should not return access denied 60ms
Describing Windows container settings are correct
 [+] Do not have DisableVSmbOplock set to 1 52ms
 [+] Do not have zz values set 66ms
Describing The right container base images are installed
 [+] At least one of 'microsoft/windowsservercore' or 'microsoft/nanoserver' should be installed 338ms
Showing output from: docker version
Client:
 Version:      1.12.2-cs2-ws-beta
 API version:  1.25
 Go version:   go1.7.1
 Git commit:   050b611
 Built:        Tue Oct 11 02:35:40 2016
 OS/Arch:      windows/amd64

Server:
 Version:      1.12.2-cs2-ws-beta
 API version:  1.25
 Go version:   go1.7.1
 Git commit:   050b611
 Built:        Tue Oct 11 02:35:40 2016
 OS/Arch:      windows/amd64

Showing output from: docker info
Containers: 22
 Running: 0
 Paused: 0
 Stopped: 22
Images: 53
Server Version: 1.12.2-cs2-ws-beta
Storage Driver: windowsfilter
 Windows:
Logging Driver: json-file
Plugins:
 Volume: local
 Network: nat null overlay
Swarm: inactive
Default Isolation: process
Kernel Version: 10.0 14393 (14393.321.amd64fre.rs1_release_inmarket.161004-2338)
Operating System: Windows Server 2016 Standard
OSType: windows
Architecture: x86_64
CPUs: 2
Total Memory: 6 GiB
Name: POPDOCKER1
ID: KS2Q:ORLC:HZR3:UAI5:YMOB:NYND:JMM6:CSBG:E5YO:GPJ5:4E42:4GWU
Docker Root Dir: C:\ProgramData\docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Warnings & errors from the last 24 hours

   ProviderName: docker

TimeCreated                     Id LevelDisplayName Message

---

10/20/2016 9:21:53 AM            1 Error            hcsshim::PrepareLayer failed in Win32: This operation returned b...
10/20/2016 9:17:16 AM            1 Error            process 7748 in container 1f3d75a838e31d1665e6b2677a38d33433b2ad...
10/20/2016 9:10:26 AM            1 Error            process 6044 in container 74d0590cc6c2734ee51ee1e8949b683019c032...
10/20/2016 9:08:44 AM            1 Error            process 4040 in container 1a3c8fa0800a072f5492e6e91ca59d0d8a095e...
10/20/2016 8:58:23 AM            1 Error            hcsshim::PrepareLayer failed in Win32: This operation returned b...
10/20/2016 8:58:11 AM            1 Error            hcsshim::PrepareLayer failed in Win32: This operation returned b...
10/20/2016 8:58:01 AM            1 Error            hcsshim::PrepareLayer failed in Win32: This operation returned b...
10/20/2016 8:56:02 AM            1 Error            hcsshim::PrepareLayer failed in Win32: This operation returned b...
10/20/2016 8:12:46 AM            1 Error            hcsshim::PrepareLayer failed in Win32: This operation returned b...
10/20/2016 8:05:25 AM            1 Error            hcsshim::PrepareLayer failed in Win32: This operation returned b...
10/19/2016 3:40:51 PM            1 Error            hcsshim::PrepareLayer failed in Win32: This operation returned b...
10/19/2016 2:53:39 PM            1 Error            hcsshim::PrepareLayer failed in Win32: This operation returned b...
10/19/2016 2:42:26 PM            1 Error            process 7544 in container 8a20d3b61900357e7925e29b477d98108706eb...
10/19/2016 2:21:24 PM            1 Error            process 6392 in container 0a25b29217978d189097bcc8c4153451c85f4f...
10/19/2016 12:04:23 PM           1 Error            process 5628 in container 7c2b88ece7596f0b505eca2368a3907a6cd947...
10/19/2016 12:03:45 PM           1 Error            hcsshim::PrepareLayer failed in Win32: This operation returned b...
10/19/2016 12:03:37 PM           1 Error            hcsshim::PrepareLayer failed in Win32: This operation returned b...
10/19/2016 12:03:28 PM           1 Error            process 4656 in container 0a680dffed25581a4b9ce91bc22c2d0aa68c75...
10/19/2016 12:03:00 PM           1 Error            Handler for PUT /v1.25/containers/0a680dffed25/archive returned ...
10/19/2016 12:03:00 PM           1 Error            Handler for HEAD /v1.25/containers/0a680dffed25/archive returned...
10/19/2016 12:02:48 PM           1 Error            Handler for PUT /v1.25/containers/0a680dffed25/archive returned ...
10/19/2016 12:02:48 PM           1 Error            Handler for HEAD /v1.25/containers/0a680dffed25/archive returned...
10/19/2016 12:02:32 PM           1 Error            Handler for PUT /v1.25/containers/0a680dffed25/archive returned ...
10/19/2016 12:02:32 PM           1 Error            Handler for HEAD /v1.25/containers/0a680dffed25/archive returned...
10/19/2016 11:58:23 AM           1 Error            hcsshim::PrepareLayer failed in Win32: This operation returned b...
10/19/2016 11:55:51 AM           1 Error            process 7416 in container e69f7b27bc93fd1ad261dd42a5ecd24ef295ee...
10/19/2016 11:52:36 AM           1 Error            hcsshim::PrepareLayer failed in Win32: This operation returned b...
10/19/2016 11:52:28 AM           1 Error            hcsshim::PrepareLayer failed in Win32: This operation returned b...
10/19/2016 11:52:19 AM           1 Error            process 1936 in container dc03e05a62881912a28cd866b2e4fd7e9a98ac...
10/19/2016 11:50:36 AM           1 Error            Handler for POST /v1.25/containers/create returned error: invali...
10/19/2016 11:49:55 AM           1 Error            hcsshim::PrepareLayer failed in Win32: This operation returned b...
10/19/2016 11:41:02 AM           1 Error            Handler for POST /v1.25/containers/05285227ee7b417e1a5f8db2fdf77...
10/19/2016 11:41:02 AM           1 Error            Create container failed with error: container 05285227ee7b417e1a...
10/19/2016 11:40:49 AM           1 Error            Handler for POST /v1.25/containers/90efeb70698d3b4f64343c852b70e...
10/19/2016 11:40:49 AM           1 Error            Create container failed with error: container 90efeb70698d3b4f64...
10/19/2016 11:38:48 AM           1 Error            Handler for POST /v1.25/containers/f00c887df3da781e62143a6f233b5...
10/19/2016 11:38:48 AM           1 Error            Create container failed with error: container f00c887df3da781e62...
10/19/2016 11:38:19 AM           1 Error            Handler for POST /v1.25/containers/aa85962ab4b4252b996cac4818efa...
10/19/2016 11:38:18 AM           1 Error            Create container failed with error: container aa85962ab4b4252b99...
10/19/2016 11:35:00 AM           1 Error            process 1496 in container c4775ab20e633247d10b9afd63a6d7710d4684...
10/19/2016 10:26:08 AM           1 Error            hcsshim::PrepareLayer failed in Win32: This operation returned b...
10/19/2016 10:25:40 AM           1 Error            process 7584 in container 6da43493a2df1076f3bd796a11b8ac6bba8016...
10/19/2016 10:20:49 AM           1 Error            process 4732 in container d0836fdd0aba667f75183edcb040765f9c8905...
10/19/2016 10:08:39 AM           1 Error            Handler for POST /v1.25/build returned error: Unknown instructio...
10/19/2016 10:07:55 AM           1 Error            process 3552 in container ca46f85e4f1f3d415df72af80f9cce3055ea25...
10/19/2016 9:57:56 AM            1 Error            process 4440 in container 14cec561a858b537d7a76222b2a2275b6c5ac0...
10/19/2016 9:48:29 AM            1 Error            process 3216 in container 9013f6d0aaf6ed42951bf077d308ce5663c527...
@PatrickLang
Copy link

Do you have an example output from when it returned the error? What type of step was it on (ADD/RUN/...) ?

@rayterrill
Copy link
Author

Here's the latest one I have from a COPY step:
---> 997ff512f70b
Step 5/6 : COPY *.xml C:/Program Files (x86)/Jenkins/
hcsshim::PrepareLayer failed in Win32: This operation returned because the timeout period expired. (0x5b4) layerId=9f4bf
27ac6591304a75fa4618cd42c57da195f75286afb5d317c661ae9c0fb0b flavour=1

I know it's also failed in a RUN and ADD steps as well, but that layer is currently cached so it's just blasting past that piece.

@PatrickLang
Copy link

ping @jhowardmsft @jstarks any ideas?

Thanks for the info, I pinged some devs here at Microsoft to see if anyone has seen this problem. Your Windows & Docker versions look ok, and the script didn't find any problems other than the one you're reporting

@lowenna
Copy link
Member

lowenna commented Oct 21, 2016

Pretty sure this was one @darrenstahlmsft was looking at a while back. We see it very occasionally, but haven't had a consistent repro to my knowledge.

@darstahl
Copy link
Contributor

Correct. I've seen this a few times in CI, but was never able to get a local or consistent repro. Typically it was with many simultaneous container starts rather than just 1 though.

@rayterrill What are the specs of your CPU? Is the C: drive an HDD or SSD? I'm wondering if this might be IO or CPU bound so I can attempt to repro.

@rayterrill
Copy link
Author

This is a pretty lightly spec'd TEST VM - 2vCPU, 6GB RAM.

I can pretty much repeat this at will by just running a bunch of builds (not every time, but a fair amount of the time).

@darstahl
Copy link
Contributor

darstahl commented Oct 25, 2016

@rayterrill Thanks for the info. With it, we were able to get a repro of this and think we know the cause. This is a Windows bug, I don't have an ETA on a fix yet.

@rayterrill
Copy link
Author

@darrenstahlmsft Cool - Glad you guys tracked it down.

I've got another issue that I can't find a matching issue in the open Docker issues - Pasting longer PowerShell contents into a container in interactive mode truncates the text - have to copy/paste in small increments on Server 2016. Really annoying. Works fine in normal PowerShell - just does this in the context of a container in interactive mode. Open issue?

@PatrickLang
Copy link

@rayterrill sure - feel free to open a new issue

@rayterrill
Copy link
Author

@PatrickLang Will do. Trying not to clutter y'alls space unnecessarily. :)

@thaJeztah thaJeztah added the kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. label Oct 29, 2016
@charlessolar
Copy link

charlessolar commented Jan 13, 2017

Any idea what causes this issue? Slow disk?
I've started to get this a lot in the last week or so for seemingly no reason. What kinds of things can I do to mitigate the cause of the issue?

Edit I get this while starting containers

@georgyturevich
Copy link

georgyturevich commented Jan 20, 2017

@darrenstahlmsft Hi Darren,

Could you share an information how did you reproduce it? Do you have any workaraounds now? May be we need to limit about of simultaneous builds?

We are facing with the same errors.

docker: Error response from daemon: hcsshim::PrepareLayer failed in Win32: This operation returned because the timeout period expired. (0x5b4) layerId=2781d0999a73e7545a1108cfbe1c1d6d30fbbfaaa87fbb7a094573a724b36608 flavour=1.

We hosts Docker 1.13-rc4 on AWS x1.x32large instances.

docker version

PS C:\Users\georgy> docker -H 10.69.11.120 version
Client:
 Version:      1.13.0-rc7
 API version:  1.25
 Go version:   go1.7.3
 Git commit:   48a9e53
 Built:        Fri Jan 13 21:41:57 2017
 OS/Arch:      windows/amd64

Server:
 Version:      1.13.0-rc4
 API version:  1.25 (minimum version 1.24)
 Go version:   go1.7.3
 Git commit:   88862e7
 Built:        Sat Dec 17 01:34:17 2016
 OS/Arch:      windows/amd64
 Experimental: false

docker info

PS C:\Users\georgy> docker -H 10.69.11.120 info                                 
Containers: 79                                                                  
 Running: 67                                                                    
 Paused: 0                                                                      
 Stopped: 12                                                                    
Images: 176                                                                     
Server Version: 1.13.0-rc4                                                      
Storage Driver: windowsfilter                                                   
 Windows:                                                                       
Logging Driver: json-file                                                       
Plugins:                                                                        
 Volume: local                                                                  
 Network: l2bridge l2tunnel nat null overlay transparent                        
Swarm: inactive                                                                 
Default Isolation: process                                                      
Kernel Version: 10.0 14393 (14393.576.amd64fre.rs1_release_inmarket.161208-2252)
Operating System: Windows Server 2016 Datacenter                                
OSType: windows                                                                 
Architecture: x86_64                                                            
CPUs: 64                                                                        
Total Memory: 1.906 TiB                                                         
Name: WIN-B737EJJT7B8                                                           
ID: 62EZ:P3MP:SQ3U:3UET:3JAP:QLAU:X2O4:E5NA:7WAH:BUGN:CAQT:VZ5V                 
Docker Root Dir: E:\docker_storage_1_13                                         
Debug Mode (client): false                                                      
Debug Mode (server): false                                                      
Registry: https://index.docker.io/v1/                                           
Experimental: false                                                             
Insecure Registries:                                                            
 127.0.0.0/8                                                                    
Live Restore Enabled: false                                                     

Maybe I can provide you some additional information?

@amitkumar-wankawala
Copy link

Hi @darrenstahlmsft

Facing the same issue intermittently on different containers.

docker: Error response from daemon: hcsshim::PrepareLayer failed in Win32: This operation returned because the timeout period expired. (0x5b4) layerId=2781d0999a73e7545a1108cfbe1c1d6d30fbbfaaa87fbb7a094573a724b36608 flavour=1.

docker version

Client:
 Version:      1.12.2-cs2-ws-beta
 API version:  1.25
 Go version:   go1.7.1
 Git commit:   050b611
 Built:        Tue Oct 11 02:35:40 2016
 OS/Arch:      windows/amd64

Server:
 Version:      1.12.2-cs2-ws-beta
 API version:  1.25
 Go version:   go1.7.1
 Git commit:   050b611
 Built:        Tue Oct 11 02:35:40 2016
 OS/Arch:      windows/amd64

@charlessolar
Copy link

charlessolar commented Jan 29, 2017

Upgraded to 1.13 windows build - still getting this error.
I can start 10 containers normally then ~75% of docker run end in failure.

I've started to quadrupedal the number of containers I start just so I get the 1/4 I was requesting...

System:
2 Xeon E5-2670 v3 2.3ghz
256 GB Ram
Local SSD

All updates installed

@derage
Copy link

derage commented Mar 23, 2017

I'm having a very similar issue to this one, but it seems to be consistent and I have been able to figure out when it actually happens. Not sure if I should open a new issue or just tag onto this one. Let me know if I should open a new one and I can do so.

Issue

When Building an image, if the cached layer of the image needs to be removed because the Dockerfile was changed for that layer, the error message below occurs and the build command will fail.

If you rebuild with the same command, it will pass. This is because the removal did in fact work, it just took the OS longer to remove the layer then docker expected it to take.

Removing intermediate container c612e411531c
hcsshim::PrepareLayer failed in Win32: This operation returned because the timeout period expired. 
(0x5b4) layerId=c612e411531caf0f6d2cd34dedbacf666c85e9be78fe11697174140daac2c72b flavour=1

Steps to reproduce

  1. Setup docker on windows 2016, then enable remote api calls on the system so another system can use the daemon
Stop-Service docker
dockerd --unregister-service
dockerd -H npipe:// -H 0.0.0.0:2375 --register-service
Start-Service docker
  1. Now setup another windows 2016 server with docker, but this time set it up to use the other systems daemon by setting dockerhost env variable
$env:DOCKER_HOST="tcp://{IPADDRESS}:2375";
  1. Now in the second environment, build a Dockerfile, then change a line in the Dockerfile and rebuild

If this doesnt work above. Slow the drive read/write speeds down a bit so the remove container step takes longer then the command expects.

Initial thoughts

I think this might be related to our read/write speeds on our system. I will need to confirm what they actually are, but I believe we are going to a SAN for storage. I tried to look up how I could extend the timeout set for the build command, but there doesnt seem to be a way to configure it. Any thought on how I might be able to extend the timeout?

PS C:\Windows\system32> docker info
Containers: 2
 Running: 1
 Paused: 0
 Stopped: 1
Images: 79
Server Version: 17.03.0-ee-1
Storage Driver: windowsfilter
 Windows:
Logging Driver: json-file
Plugins:
 Volume: local
 Network: l2bridge l2tunnel nat null overlay transparent
Swarm: inactive
Default Isolation: process
Kernel Version: 10.0 14393 (14393.447.amd64fre.rs1_release_inmarket.161102-0100)
Operating System: Windows Server 2016 Standard
OSType: windows
Architecture: x86_64
CPUs: 4
Total Memory: 16 GiB
Docker Root Dir: E:\ProgramData\Docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
PS C:\Windows\system32>

PS C:\Windows\system32> docker version
Client:
 Version:      17.03.0-ee-1
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   9094a76
 Built:        Wed Mar  1 00:49:51 2017
 OS/Arch:      windows/amd64

Server:
 Version:      17.03.0-ee-1
 API version:  1.26 (minimum version 1.24)
 Go version:   go1.7.5
 Git commit:   9094a76
 Built:        Wed Mar  1 00:49:51 2017
 OS/Arch:      windows/amd64
 Experimental: false
PS C:\Windows\system32>

CPU: AMD Opteron(TM) Processor 6272 2.10 GHz (2)
RAM: 16.00GB
Operating System: Microsoft Windows Server 2016 Standard, Service Pack: 0

@darstahl
Copy link
Contributor

@derage no need to open a new issue. It looks like the same underlying cause as this one. In both cases, Docker is attempting to mount the container filesystem on the host, and even though the mount succeeded, the mount does not surface on the host for a long delay. The most likely cause is either overloaded disk, or another application intercepting the mount and not allowing Docker access to it for some time.

Unfortunately the timeout is in the platform and not configurable.

This seems like it might be related to running a docker build with storage on a SAN, at least, that seems to be one of the culprits for this error. Can you confirm that using the same steps above, but starting the build on the container host instead of from an external machine repros or not? It might just be timing, but I don't think the remote API calls should be necessary to repro.

Confirming the read/write speeds, as well as checking for latency in the SAN would be the next steps here.

So far, the mitigation steps I would suggest are:

  • avoiding creating many containers in parallel if possible (a small delay between container starts helps)
  • disabling other applications which may scan new disks arriving on the system such as antivirus and monitoring software
  • Disabling any IO QoS or throttling which may cause delays while mounting the container layers

@dtma007
Copy link

dtma007 commented Apr 12, 2017

I'm encountering the same error repeatedly
hcsshim::PrepareLayer failed in Win32: This operation returned because the timeout period expired.

This is on Windows 2016 Server. I'm only building one container at a time. I'm not using SAN. I don't think there are any disk scans running.

It's consistently happening so making it impossible to progress with development work on creating Windows Docker containers. Any suggestions for how to progress? Thanks!

This is happening when I'm trying to run the MusicStore example from lab:
https://github.com/docker/labs/blob/master/windows/windows-containers/MultiContainerApp.md

Step 7/15 : COPY version.props .
ERROR: Service 'web' failed to build: hcsshim::PrepareLayer failed in Win32: This operation returned because the timeout period expired. (0x5b4) layerId=8d12b9fd1f72066c6118fa911a65
752a7d18adddf4cc9265812c9cf5f81a087d flavour=1
docker version
PS C:\dev\dotnet-musicstore> docker version
Client:
 Version:      17.03.1-ee-3
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   3fcee33
 Built:        Thu Mar 30 19:31:22 2017
 OS/Arch:      windows/amd64

Server:
 Version:      17.03.1-ee-3
 API version:  1.27 (minimum version 1.24)
 Go version:   go1.7.5
 Git commit:   3fcee33
 Built:        Thu Mar 30 19:31:22 2017
 OS/Arch:      windows/amd64
 Experimental: false

@olljanat
Copy link
Contributor

olljanat commented May 9, 2017

I had this same issue with just created Win 2016. After installing all Windows updates and excluding docker.exe and dockerd.exe from Defender it started working. I also tried to remove these excludes now and it still works but it can be that they was needed on first run.

EDIT: Problem started again and even excluding these processes from Defender did not helped now.

Has anyone more ideas what to test/check?

@andyliddle
Copy link

I am getting this issue when attempting to implement docker builds as part of my Automated Builds.
Build Server is virtual (Windows Server 2016 4cores and 6GB Ram) it is using a SAN for storage but that is out of my control.

The builds are very hit and miss, the timeout never happens at the exact same step in the build.

I've ran some performance counters during a build and did not see any spikes on CPU, Memory and disk utilization.

I wonder if Microsoft do something to make timeout configurable.

@fusionx86
Copy link

I'm seeing this error much too often across multiple Windows 2016 docker hosts. It gets to the point where I cannot docker build anything. I restart the Docker Engine service and things work fine for a little while, but the problem comes back. I'm looking into moving C# services to containers for my company, but running into this type of error so early in the process is concerning.

The strange thing is, it only takes about a second to throw the timeout error so I wonder if it's really timing out or dying due to something else.

@thaJeztah
Copy link
Member

ping @darrenstahlmsft @PatrickLang since this look like a platform (Windows) issue, do you know if a fix is planned in a future Windows update?

@olljanat
Copy link
Contributor

@fusionx86, Which platform you are running these docker hosts? I have noticed that timeouts happens only on our VMware platform. It does not happen on Azure.

It would be interesting to hear if someone can try to run Win 2016 docker hosts on VMware where virtual hardware version is at least 11 (requires ESXi 6.0 version), enable nested Hyper-V support based on this guide: https://www.derekseaman.com/2014/06/nesting-hyper-v-2012-r2-esxi-5-5.html and then run Docker with Hyper-V isolation.

@fusionx86
Copy link

Hello @olljanat. For my proof of concept, I'm running docker hosts on a local workstation. The stack is Linux -> VirtualBox -> Win2016 -> Container. If we roll containers out to production, Linux/VirtualBox would be replaced with ESXi. We're not at that point yet however.

I figured that if my Win2016 server met the requirements for docker, everything would be fine, but your mention of EXSi makes me wonder if there are underlying hypervisor compatibility requirements or problems. Just for reference, here are versions for the relevant software I'm using in my PoC:

  • Virtualbox: 5.1.22r115126 amd64

  • Windows Server: 2016 Standard (Fully patched)

  • Base Image: microsoft/windowsservercore

  • Docker Client:

    • Version: 17.03.1-ee-3
    • API version: 1.27
    • Go version: go1.7.5
    • Git commit: 3fcee33
    • Built: Thu Mar 30 19:31:22 2017
    • OS/Arch: windows/amd64
  • Docker Server:

    • Version: 17.03.1-ee-3
    • API version: 1.27 (minimum version 1.24)
    • Go version: go1.7.5
    • Git commit: 3fcee33
    • Built: Thu Mar 30 19:31:22 2017
    • OS/Arch: windows/amd64
    • Experimental: false

@olljanat
Copy link
Contributor

FYI, I re-created my Docker build server as Core version of Win 2016 (earlier it was with full GUI) and I have not seen this issue anymore after that even when running it on VMware platform where I had issues earlier.

@fusionx86
Copy link

@olljanat Thanks for sharing. Based on your post, I moved all my docker building to a core server two days ago and haven't had any problems since. 👍

@drnybble
Copy link

drnybble commented Jul 25, 2017

Any updates on this issue?
If a docker-compose up brings up a lot of containers this timeout seems almost sure to be hit.
In that case certain containers fail to be created. You have to keep doing docker-compose up until eventually all of the containers are created.
It is nice if one has the option of using Core to avoid this issue but I am sure Microsoft wants it to work on all supported editions.

@olljanat
Copy link
Contributor

olljanat commented Jul 25, 2017

@drnybble, I have not seen this issue after I re-created all my Docker hosts using Core version of Windows and to be honest I don't know any good reason to use full GUI version for that purpose.

Anyway, I think that @kallie-b is right person to say what is status on Microsoft side to investigating this issue?

@derage
Copy link

derage commented Jul 25, 2017

I confirmed what olljanat has said. This issue only seems to be reproducible when using windows 2016 GUI. switching to windows core 2016 does work

@dtma007
Copy link

dtma007 commented Jul 25, 2017

@olljanat and @derage I am using a machine provided by a cloud computing provider that has Windows Server 2016 Standard Edition pre-installed. Do you know of any way of switching to Server Core in Windows 2016? I've been looking for a way, but haven't found it yet. Just wondering what approach you used. Thanks!

@github11223344
Copy link

Any updates on this error and solution? Is the only solution to switch to Core version? Thanks.

@kallie-b
Copy link

kallie-b commented Dec 14, 2017

Apologies, it's difficult to parse this issue without a clear mapping between Server versions and Docker versions being used. Would anyone still experiencing this issue please comment with the Server and Docker versions that they're using? (I'm assuming that if the issue is still coming up the Server/Docker versions being used are not the same as the original comment on this issue...)

Also, if you're experiencing this issue can you confirm that you've installed all updates on your Server build, and the issue is persisting?

@hanxinimm
Copy link

@kallie-b
image

@github11223344
Copy link

I am seeing it with Windows 2016 Build 1607 and Docker 17.09.1-ce

@olljanat
Copy link
Contributor

@github11223344 FYI, we have been running dozens of builds every day since of June on core server without issues so that is working solution.

@darstahl
Copy link
Contributor

@hanxinimm does the docker restart command that fails hang for a long time before returning the timeout error? Does it happen every time? How many layers are in the image that is failing?

The build you posted should have a fix for the initial issue in this thread. In order to check if this is a different issue, can you collect some logs for me? Instructions are as follows:

Download HcsTraceProfile.wprp, found in https://gist.githubusercontent.com/jhowardmsft/71b37956df0b4248087c3849b97d8a71 then run the following:

Wpr.exe -start HcsTraceProfile.wprp!HcsArgon -filemode

<repro issue>

Wpr.exe -stop HcsTrace.etl "PrepareLayer timeout"

Note that the resulting HcsTrace.etl may contain personal information, such as path names accessed on the host, current running processes, etc, so I suggest sending it to me directly at [email protected].

@pradley
Copy link

pradley commented Dec 21, 2017

Im also having this same problem, but mine is when I say Docker-compose up on one of my services it says this :/

@georgyturevich
Copy link

@darrenstahlmsft Hi Darren,

I also see this error in EventLog when Dockerd tries to start 200 containers which have --restart always option on big-core machines. This is actual for one our prod server which hosts about 200 container so we had to disable this option for all of this containers. And I was able to reproduce this error by runing following snippet which runs 200 containers with --restart=always option, then stop and start docker service .

docker pull georgyturevich/tomcat:choco_install; `
docker version; `
docker info; `
mkdir C:\test_volume; `
$CountContainers = 200; `
docker stop $(docker ps -af name=gturevich-test --format "{{.Names}}"); `
docker rm $(docker ps -af name=gturevich-test --format "{{.Names}}");`
$MaintStartTime = Get-Date; `
for ($I=1; $I -le $CountContainers; $I++)  { `
  $Start = Get-Date; echo "$Start`: Runing #$I (of $CountContainers) ..."; `
  $Ports = "$(11000+$I)`:80"; `
  docker run --name "gturevich-test-$I" -p $Ports --restart=always -e "JAVA_OPTS=-Xms512M -Xmx512M" -v C:\test_volume:C:\volume -d georgyturevich/tomcat:choco_install; `
  $Stop = Get-Date; echo "$stop`: #$I (of $CountContainers) started in $(($Stop - $Start).TotalSeconds) s. Full time is: $(($Stop - $MaintStartTime).TotalMinutes) m."; `
}`
echo "$(date): Containers count before restart: $((docker ps -q).count). Stopping Docker service ...";`
Stop-Service docker; `
echo "$(date): Docker stopped. Dismounting ...";`
date; Get-Disk | ? {$_.Location -like "*docker*"} | foreach { Write-Host "Dismounting" $_.Location; Dismount-VHD -DiskNumber $_.Number}; date; `
Sleep 3; `
echo "$(date): Starting docker ...";`
Start-Service docker;`
echo "$(date): Started.";
$K=0; while($K -lt 20) { `
    $K++;`
	echo "$(date): Requesting containers list ..."; `
	$Start = Get-Date; echo " Containers count after restart: $((docker ps -q).count); Current date: $(date)";`
	$Stop = Get-Date;  echo "$(date): Result was received in $(($Stop - $Start).TotalSeconds) s.  Full time is: $(($Stop - $MaintStartTime).TotalMinutes) m."; Sleep 5; `
}

The bug has intermittent behavior. Sometimes when I run first time Dockerd was able to all 200 hundred containers, but on second or third was not.

Also I discovered that this bug is much easier to reproduce on big core instances. For example AWS - x1.16xl/c5.18xl/m5.24xl which have 64/72/96 CPUs. And it is harder to reproduce on instances with lower amount of cores. For example AWS x1e.xlarge with 122.0 GiB RAM and 4 vCPUs starts containers much slower but most times without Timeout errors.

I will try to collect HcsTrace log when I have free minute.

Thanks!

@MikeChristensen
Copy link

Please, please fix this! This bug is basically ruining Docker on Windows. A few thoughts on the subject:

  • This definitely seems to happen a lot more on my Windows 10 laptop (32 gigs of RAM) than Windows Server 2016 (with 8 gigs)
  • This happens on random steps, it's never consistent. Building again will usually finish.
  • The longer the Dockerfile (the more layers), the more likely it is to happen. I have Dockerfiles I've had to break apart since I just can't build them in one go.
  • If this is a Windows bug due to some sort of timeout, can that error be trapped and Docker retry a few times until it succeeds? Unless Microsoft is aware of the bug and actively working on a fix, I feel Docker should attempt some sort of work around even if hacky.

@lowenna
Copy link
Member

lowenna commented Feb 23, 2019

Closing. If this still happens on latest Windows RS5 (Server 2019) and latest moby, please open a new issue.

@ShemTovYosef
Copy link

I have the same issue when I upgraded Windows from 1809 to 1903

@olljanat
Copy link
Contributor

olljanat commented Jun 6, 2019

@hinews on server or Win 10 and with core or full UI version?

@MikeChristensen
Copy link

Curious if it's a difference between hyperv and process isolation (since if you're building 1809 containers, the upgrade would make you use hyperv now)

@ShemTovYosef
Copy link

ShemTovYosef commented Jun 6, 2019

Host: Windows 10 1903, Docker Image based on Windows Image 1809 server, Docker 1809

After that I tried to update Windows image to 1903 server and the same issue

I rechecked my dockerfile on other host: AWS Windows Server 2019 and it’s working

So, the bug related to Windows Host: Windows 10 1903

The related bug was created by @mikeparker in docker/for-win#3884

@ShemTovYosef
Copy link

In my case it's simple Dockerfile
FROM winamd64/python:2.7.16-windowsservercore-1809 MAINTAINER [email protected]
On the second step I'm getting error:
hcsshim::PrepareLayer - failed failed in Win32: Incorrect function. (0x1)

@mikeparker
Copy link

To be clear the new error is slightly different than the one reported here, but seems quite possibly related.
hcsshim::PrepareLayer failed in Win32: This operation returned because the timeout period expired. (0x5b4)
vs
hcsshim::PrepareLayer failed in Win32: Incorrect function. (0x1)

@olljanat
Copy link
Contributor

olljanat commented Jun 6, 2019

@hinews @mikeparker IMO that is totally different error. Only thing which is common is that it comes from Win32 (most probably from Windows kernel) and because configuration is also very different (new is version, Hyper-V isolation instead of process, etc..) I assume that root cause is also different so plz continue on docker/for-win#3884 instead of this old already resolved issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/builder kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. platform/windows version/1.12
Projects
None yet
Development

No branches or pull requests