-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hcsshim::PrepareLayer failed in Win32: This operation returned because the timeout period expired. (0x5b4) #27588
Comments
Do you have an example output from when it returned the error? What type of step was it on (ADD/RUN/...) ? |
Here's the latest one I have from a COPY step: I know it's also failed in a RUN and ADD steps as well, but that layer is currently cached so it's just blasting past that piece. |
ping @jhowardmsft @jstarks any ideas? Thanks for the info, I pinged some devs here at Microsoft to see if anyone has seen this problem. Your Windows & Docker versions look ok, and the script didn't find any problems other than the one you're reporting |
Pretty sure this was one @darrenstahlmsft was looking at a while back. We see it very occasionally, but haven't had a consistent repro to my knowledge. |
Correct. I've seen this a few times in CI, but was never able to get a local or consistent repro. Typically it was with many simultaneous container starts rather than just 1 though. @rayterrill What are the specs of your CPU? Is the C: drive an HDD or SSD? I'm wondering if this might be IO or CPU bound so I can attempt to repro. |
This is a pretty lightly spec'd TEST VM - 2vCPU, 6GB RAM. I can pretty much repeat this at will by just running a bunch of builds (not every time, but a fair amount of the time). |
@rayterrill Thanks for the info. With it, we were able to get a repro of this and think we know the cause. This is a Windows bug, I don't have an ETA on a fix yet. |
@darrenstahlmsft Cool - Glad you guys tracked it down. I've got another issue that I can't find a matching issue in the open Docker issues - Pasting longer PowerShell contents into a container in interactive mode truncates the text - have to copy/paste in small increments on Server 2016. Really annoying. Works fine in normal PowerShell - just does this in the context of a container in interactive mode. Open issue? |
@rayterrill sure - feel free to open a new issue |
@PatrickLang Will do. Trying not to clutter y'alls space unnecessarily. :) |
Any idea what causes this issue? Slow disk? Edit I get this while starting containers |
@darrenstahlmsft Hi Darren, Could you share an information how did you reproduce it? Do you have any workaraounds now? May be we need to limit about of simultaneous builds? We are facing with the same errors.
We hosts Docker 1.13-rc4 on AWS x1.x32large instances.
Maybe I can provide you some additional information? |
Hi @darrenstahlmsft Facing the same issue intermittently on different containers.
|
Upgraded to 1.13 windows build - still getting this error. I've started to quadrupedal the number of containers I start just so I get the 1/4 I was requesting... System: All updates installed |
I'm having a very similar issue to this one, but it seems to be consistent and I have been able to figure out when it actually happens. Not sure if I should open a new issue or just tag onto this one. Let me know if I should open a new one and I can do so. IssueWhen Building an image, if the cached layer of the image needs to be removed because the Dockerfile was changed for that layer, the error message below occurs and the build command will fail. If you rebuild with the same command, it will pass. This is because the removal did in fact work, it just took the OS longer to remove the layer then docker expected it to take.
Steps to reproduce
If this doesnt work above. Slow the drive read/write speeds down a bit so the remove container step takes longer then the command expects. Initial thoughtsI think this might be related to our read/write speeds on our system. I will need to confirm what they actually are, but I believe we are going to a SAN for storage. I tried to look up how I could extend the timeout set for the build command, but there doesnt seem to be a way to configure it. Any thought on how I might be able to extend the timeout?
|
@derage no need to open a new issue. It looks like the same underlying cause as this one. In both cases, Docker is attempting to mount the container filesystem on the host, and even though the mount succeeded, the mount does not surface on the host for a long delay. The most likely cause is either overloaded disk, or another application intercepting the mount and not allowing Docker access to it for some time. Unfortunately the timeout is in the platform and not configurable. This seems like it might be related to running a docker build with storage on a SAN, at least, that seems to be one of the culprits for this error. Can you confirm that using the same steps above, but starting the build on the container host instead of from an external machine repros or not? It might just be timing, but I don't think the remote API calls should be necessary to repro. Confirming the read/write speeds, as well as checking for latency in the SAN would be the next steps here. So far, the mitigation steps I would suggest are:
|
I'm encountering the same error repeatedly This is on Windows 2016 Server. I'm only building one container at a time. I'm not using SAN. I don't think there are any disk scans running. It's consistently happening so making it impossible to progress with development work on creating Windows Docker containers. Any suggestions for how to progress? Thanks! This is happening when I'm trying to run the MusicStore example from lab:
|
I had this same issue with just created Win 2016. After installing all Windows updates and excluding docker.exe and dockerd.exe from Defender it started working. I also tried to remove these excludes now and it still works but it can be that they was needed on first run. EDIT: Problem started again and even excluding these processes from Defender did not helped now. Has anyone more ideas what to test/check? |
I am getting this issue when attempting to implement docker builds as part of my Automated Builds. The builds are very hit and miss, the timeout never happens at the exact same step in the build. I've ran some performance counters during a build and did not see any spikes on CPU, Memory and disk utilization. I wonder if Microsoft do something to make timeout configurable. |
I'm seeing this error much too often across multiple Windows 2016 docker hosts. It gets to the point where I cannot The strange thing is, it only takes about a second to throw the timeout error so I wonder if it's really timing out or dying due to something else. |
ping @darrenstahlmsft @PatrickLang since this look like a platform (Windows) issue, do you know if a fix is planned in a future Windows update? |
@fusionx86, Which platform you are running these docker hosts? I have noticed that timeouts happens only on our VMware platform. It does not happen on Azure. It would be interesting to hear if someone can try to run Win 2016 docker hosts on VMware where virtual hardware version is at least 11 (requires ESXi 6.0 version), enable nested Hyper-V support based on this guide: https://www.derekseaman.com/2014/06/nesting-hyper-v-2012-r2-esxi-5-5.html and then run Docker with Hyper-V isolation. |
Hello @olljanat. For my proof of concept, I'm running docker hosts on a local workstation. The stack is Linux -> VirtualBox -> Win2016 -> Container. If we roll containers out to production, Linux/VirtualBox would be replaced with ESXi. We're not at that point yet however. I figured that if my Win2016 server met the requirements for docker, everything would be fine, but your mention of EXSi makes me wonder if there are underlying hypervisor compatibility requirements or problems. Just for reference, here are versions for the relevant software I'm using in my PoC:
|
FYI, I re-created my Docker build server as Core version of Win 2016 (earlier it was with full GUI) and I have not seen this issue anymore after that even when running it on VMware platform where I had issues earlier. |
@olljanat Thanks for sharing. Based on your post, I moved all my docker building to a core server two days ago and haven't had any problems since. 👍 |
Any updates on this issue? |
@drnybble, I have not seen this issue after I re-created all my Docker hosts using Core version of Windows and to be honest I don't know any good reason to use full GUI version for that purpose. Anyway, I think that @kallie-b is right person to say what is status on Microsoft side to investigating this issue? |
I confirmed what olljanat has said. This issue only seems to be reproducible when using windows 2016 GUI. switching to windows core 2016 does work |
@olljanat and @derage I am using a machine provided by a cloud computing provider that has Windows Server 2016 Standard Edition pre-installed. Do you know of any way of switching to Server Core in Windows 2016? I've been looking for a way, but haven't found it yet. Just wondering what approach you used. Thanks! |
Any updates on this error and solution? Is the only solution to switch to Core version? Thanks. |
Apologies, it's difficult to parse this issue without a clear mapping between Server versions and Docker versions being used. Would anyone still experiencing this issue please comment with the Server and Docker versions that they're using? (I'm assuming that if the issue is still coming up the Server/Docker versions being used are not the same as the original comment on this issue...) Also, if you're experiencing this issue can you confirm that you've installed all updates on your Server build, and the issue is persisting? |
I am seeing it with Windows 2016 Build 1607 and Docker 17.09.1-ce |
@github11223344 FYI, we have been running dozens of builds every day since of June on core server without issues so that is working solution. |
@hanxinimm does the The build you posted should have a fix for the initial issue in this thread. In order to check if this is a different issue, can you collect some logs for me? Instructions are as follows: Download HcsTraceProfile.wprp, found in https://gist.githubusercontent.com/jhowardmsft/71b37956df0b4248087c3849b97d8a71 then run the following:
Note that the resulting HcsTrace.etl may contain personal information, such as path names accessed on the host, current running processes, etc, so I suggest sending it to me directly at [email protected]. |
Im also having this same problem, but mine is when I say Docker-compose up on one of my services it says this :/ |
@darrenstahlmsft Hi Darren, I also see this error in EventLog when Dockerd tries to start 200 containers which have
The bug has intermittent behavior. Sometimes when I run first time Dockerd was able to all 200 hundred containers, but on second or third was not. Also I discovered that this bug is much easier to reproduce on big core instances. For example AWS - x1.16xl/c5.18xl/m5.24xl which have 64/72/96 CPUs. And it is harder to reproduce on instances with lower amount of cores. For example AWS x1e.xlarge with 122.0 GiB RAM and 4 vCPUs starts containers much slower but most times without I will try to collect HcsTrace log when I have free minute. Thanks! |
Please, please fix this! This bug is basically ruining Docker on Windows. A few thoughts on the subject:
|
Closing. If this still happens on latest Windows RS5 (Server 2019) and latest moby, please open a new issue. |
I have the same issue when I upgraded Windows from 1809 to 1903 |
@hinews on server or Win 10 and with core or full UI version? |
Curious if it's a difference between hyperv and process isolation (since if you're building 1809 containers, the upgrade would make you use hyperv now) |
Host: Windows 10 1903, Docker Image based on Windows Image 1809 server, Docker 1809 After that I tried to update Windows image to 1903 server and the same issue I rechecked my dockerfile on other host: AWS Windows Server 2019 and it’s working So, the bug related to Windows Host: Windows 10 1903 The related bug was created by @mikeparker in docker/for-win#3884 |
In my case it's simple Dockerfile |
To be clear the new error is slightly different than the one reported here, but seems quite possibly related. |
@hinews @mikeparker IMO that is totally different error. Only thing which is common is that it comes from Win32 (most probably from Windows kernel) and because configuration is also very different (new is version, Hyper-V isolation instead of process, etc..) I assume that root cause is also different so plz continue on docker/for-win#3884 instead of this old already resolved issue. |
Description
Repeatedly receiving "hcsshim::PrepareLayer failed in Win32: This operation returned because the timeout period expired. (0x5b4)" when trying to build images using "docker build" on Server 2016 (docker client/server version 1.12.2-cs2-ws-beta). Rerunning the build (sometimes multiple times) eventually allows the build to complete, but this is annoying. Note that this does not happen every time, but it does occur frequently.
Steps to reproduce the issue:
Describe the results you received:
Describe the results you expected:
Build should complete succesfully without errors.
Additional information you deem important (e.g. issue happens only occasionally):
Issue happens frequently, but not every time. I have been able to successfully build images by running the build over and over again until it succeeds.
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.):
VMWare VM running Server2016 GA on-prem.
https://raw.githubusercontent.com/Microsoft/Virtualization-Documentation/master/windows-server-container-tools/Debug-ContainerHost/Debug-ContainerHost.ps1 Output
The text was updated successfully, but these errors were encountered: