-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ue4-docker build 4.22.2 fails with 'out of space' with new-enough Docker EE #44
Comments
On examination, 200gB is probably simply not enough:
So it's possible that The command I'm using is
and it's happily resuming at the problematic I've tried changing the daemon.json to have a 400gB limit, and executing
One thing I did notice, is that |
Hmmm, that's a bit puzzling. The 8GB limit fix should definitely be included in those versions of Docker. Regarding the |
To get the real image-size limit, perhaps the script could fire up a small, known image, and query the free disk space, as I did above. After a reboot, the problem persists. I'll try and debug it some more later, but in the meantime, I'm going to try and put the Linux containers (which built fine) into CI first, as that's a larger pain-point in our CI system. |
I tried to create a simple reproduction of the "8gB problem", but didn't succeed. A null-byte file ( I tried switching from MS's 18.09.6 (from I've installed 19.03.2-rc2 from
and am now rebuilding to see if that passes. Interestingly, step 6 (the first RUN in ue4-minimal) didn't pull from the cache after the Docker update, I wonder if something has changed upstream that would cause that to not match. I actually had to bypass the SHA256 verification of the downloaded .zip of 19.03.0-rc2. I downloaded it from a couple of places, and always got the same SHA256, but it never matched the manifest SHA256. |
Issue reproduced with 19.03.0-rc2 as well. |
Damn, that's a bit worrying, the 8GB fix should well and truly be integrated in versions of Docker that new. Out of curiosity, does the build succeed if you use the |
I'll try I've just been playing with debug logging for Docker (still 19.03.0-rc2), but it doesn't really tell me much:
I'm not sure if the 8gB problem gives those logs. I was working with an abbreviated Dockerfile which started from the final state of the build container before the second docker image tag 40ea5414bef9 thebuilder:4.22.2-ltsc2019
I also confirmed that mkdir copydata
docker create thebuilder:4.22.2-ltsc2019
# 8517e8bcc6764b25f0eadf7cfdee2f7506f69701f8dae1a3647112e7e6ce2f62
docker cp 8517e8bcc676:/UnrealEngine/LocalBuilds/Engine/Windows .\copydata\ |
I can confirm that with If I remember correctly, the last set of images I built were under a version of ue4-docker where the PDBs were automatically excluded, which was under an AWS-hosted by otherwise similar build environment (ltsc2019), for 4.21.1. So I'm wondering if somehow this problem is exclusive to Docker EE, and the people who've fixed #37 by upgrading to 18.09.6 were actually unintentionally switching to Docker CE or something. If I find some time, I might try and finish my effort to put together a small repro case, as all the repro-cases I found for this issue were for Linux containers. Then I could also check it against Docker CE easily. |
Okay, so it's definitely a size issue then if the version with truncated debug symbols works. To the best of my knowledge, it's not possible to install Docker CE under Windows Server (unless using Docker Desktop under a version of Windows Server with the Desktop Experience), but I'd be genuinely interested to see if Docker CE 19.03.0-beta3 suffers from the same issue when running the edge release of Docker Desktop under Windows 10. (The latest stable release of Docker Desktop currently includes Docker CE 18.09.2, which definitely does suffer from this issue.) A reliable repro Dockerfile for Windows containers would be very useful, since we could then systematically test various versions of both Docker CE and Docker EE to determine exactly which release(s) do and do not include the 8GB limit fix. |
Although I know you were encountering this problem under Windows Server 1809, it's interesting to note that a related bug has just been positively identified in Windows Server 1903: docker/for-win#3884 (comment) |
Docker Enterprise for Windows Server (DockerMsftProvider) now has a release of 19.03.01, so sometime in the next couple of weeks I expect to be trying this out again with UE 4.22, same as the previous attempt, but slightly patched this time for Server and Client builds. |
Yeah, I noticed yesterday that the latest version of Docker Desktop now includes 19.03.1, so it'll be interesting to see how it goes. I might actually spin up a Windows Server 1809 VM and test it out too. |
@TBBle I've just added functionality to ue4-docker that allows users to test for the presence of the 8GiB filesystem layer bug: #63 (comment) |
Awesome. I gave the new version a go at home (as a baseline) on Docker Desktop 2.2.0.0 (44247) I'll have a run on the system at work about which I originally logged this ticket. It's probably due to have its Docker installation upgraded anyway. Linux containers (VM) mode
I think it's trying to
Annoyingly, without Windows Containers mode:
Same results so far as described in #63 (comment). I turned on 'experimental mode' to get access to LCOW
So yeah, LCOW mode passed, and (if I understand correctly) that's the same daemon as WCOW, so it's not a missing daemon patch in the Windows build, as surmised. Without experimental mode enabled in the daemon, --linux failed similarly to how lack of --linux failed in Linux Containers mode, with a complaint that the requested base image did not have a matching platform. I'm not sure if this is usefully detectable... Perhaps there's some reliable combination of server OS/arch and available storage drivers that can report the right failure earlier.
Thought I'd give process isolation mode a chance while I was in Windows Containers mode. I had to do it by hand, because ue4docker sees the kernel version (via Anyway, looks like docker/for-win#3884 affects Windows 1909 as well.
Also, that issue got hijacked by discussions of a different error, caused by filesystem filters. Probably worth changing the link in the docs to docker/for-win#4100, where it's clearly visible that there'll be a workaround for this in Docker Engine 19.03.6. A Edit: Opened UnrealContainers/unrealcontainers.github.io#2 for the doc updates in that last paragraph. |
I captured the relevant log snippet in case I or someone goes looking into this later, with "log-level debug" and "debug true" configured, and the process-isolation command (it's faster) from the end of my last comment.
The parent layer's own logs:
I disabled re-exec (by setting the
I confirmed that the file in |
Confirmed a slightly simpler test-case, which I'll post upstream.
the problem reproduces. |
Thanks for the thorough testing and investigation! I've now added code to detect when Docker Desktop is in Linux container mode and provided more helpful error messages (in both |
It's a bit worrying that the diagnostic passes when running in LCOW mode, since as you've noted that suggests something a bit subtler than Windows builds of the daemon not including the relevant bugfix. Thanks for opening the issue on the Moby issue tracker, hopefully someone can identify what's actually going on there. |
No worries. I'm pretty sure (having spent some time with hcsshim and containerd recently while trying to get buildkit working for WCOW) that the problem is either an issue in system call hcsshim is making ( For example, I haven't ruled out that Docker is somehow getting out of sync and cleaning up a directory that I'm leaving it alone for now, hoping someone upstream can track this down, but if I get more time, I might try and work out if I can put together a repro using just hcsshim calls, and/or add a ton of debugging to Docker to try and track down all the things going on. I'm still having trouble working out exactly how Docker ends up calling I also need to finish some games so I can clear the disk space to keep experimenting... Or fix |
BuildKit support for Windows containers would be excellent, I'll be keen to see how that progresses. Combined with a fix for this problem, it'd finally provide enough parity between Linux and Windows containers to make some serious improvements to how ue4-docker builds images. |
So it's visible, the BuildKit tracking bug for Windows Container support is moby/buildkit#616 |
Hi! While I understand that this is caused by an upstream issue, is there any known configuration of Windows/Docker versions that work around this issue? I am currently hitting it with the following configuration:
The diagnostic run looks like that:
If there is any configuration known to work at this point, I'd love to hear about it! |
@tynril I'm not currently aware of any combination of Windows/Docker version that does not exhibit this behaviour, and judging by the comments on the upstream issue that @TBBle opened, it looks like it's still an unsolved problem. In the meantime, you can successfully build Windows container images by excluding debug symbols with the |
The fix has gone into https://github.com/moby/moby/ (upstream for Docker Engine) and will be part of the 20.03.0 release, whenever that is. |
Thanks to @Agendum for the prompt, it appears that the Docker Engine 20.10.0 beta1 release included in Docker Desktop for Windows Community 2.4.2.0 (the Edge channel). However, I haven't tested the 8gig diagnostic yet to confirm that. |
Per a discussion on #99, it's possible there's a different 8gig issue in Docker Engine still (relating to COPY a different target in a multi-target Dockerfile) which is going to hit in the same circumstances as this ticket, but might be a problem in Docker Engine itself, rather than in the underlying container support. |
I've confirmed that with Docker Desktop for Windows Community Edge (2.4.2.0), the This was using the updated version of the test from #97. Test machines: For
For
|
I can confirm that the 8gig diagnostic passes when using process isolation mode with Docker CE 20.10.0 under Windows 10 version 20H2. I'm going to close this issue now and we can continue to track discussions around the potential new bug in #99. |
Output of the
ue4-docker info
command: See below.I reproduced the same failure as #37 (and the troubleshooting guide) on Windows Server 2019 with Docker 18.09.5, and upgrading to Docker 18.09.6 did not fix it.
So I don't think this is #37, unless somehow the 8GB file copy fix (moby/moby#37771) was completely excluded from the Docker EE 18.09 series, despite the release notes.
Before:
Upgrade per Microsoft's documentation
After:
The text was updated successfully, but these errors were encountered: