Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ue4-docker build 4.22.2 fails with 'out of space' with new-enough Docker EE #44

Closed
TBBle opened this issue Jun 19, 2019 · 28 comments
Closed
Labels
awaiting-upstream Awaiting bugfixes or functionality from upstream dependencies

Comments

@TBBle
Copy link
Collaborator

TBBle commented Jun 19, 2019

Output of the ue4-docker info command: See below.

I reproduced the same failure as #37 (and the troubleshooting guide) on Windows Server 2019 with Docker 18.09.5, and upgrading to Docker 18.09.6 did not fix it.

So I don't think this is #37, unless somehow the 8GB file copy fix (moby/moby#37771) was completely excluded from the Docker EE 18.09 series, despite the release notes.

Before:

(venv) PS C:\Users\localadmin> python -m ue4docker info
ue4-docker version:         0.0.34 (latest available version is 0.0.35)
Operating system:           Windows Server 2019 Standard Version 1809 (OS Build 17763.557)
Docker daemon version:      18.09.5
NVIDIA Docker supported:    No
Maximum image size:         200GB
Available disk space:       677.09 GiB
Total system memory:        63.89 GiB physical, 73.39 GiB virtual
Number of processors:       20 physical, 40 logical
(venv) PS C:\Users\localadmin> docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 49
Server Version: 18.09.5
Storage Driver: windowsfilter
 Windows:
Logging Driver: json-file
Plugins:
 Volume: local
 Network: ics l2bridge l2tunnel nat null overlay transparent
 Log: awslogs etwlogs fluentd gelf json-file local logentries splunk syslog
Swarm: inactive
Default Isolation: process
Kernel Version: 10.0 17763 (17763.1.amd64fre.rs5_release.180914-1434)
Operating System: Windows Server 2019 Standard Version 1809 (OS Build 17763.557)
Architecture: x86_64
CPUs: 40
Total Memory: 63.89GiB
Name: au1-bldwin-22
ID: IN3O:ZULX:EYEV:VGNW:EE33:WOFD:FUF3:NRTO:I2NK:6WEP:CPAL:HSPL
Docker Root Dir: C:\ProgramData\docker
Debug Mode (client): false
Debug Mode (server): false
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Upgrade per Microsoft's documentation

(venv) PS C:\Users\localadmin> Install-Package -Name Docker -ProviderName DockerMsftProvider -Update -Force

Name                           Version          Source           Summary
----                           -------          ------           -------
Docker                         18.09.6          DockerDefault    Contains Docker EE for use with Windows Server.


(venv) PS C:\Users\localadmin> Start-Service Docker

After:

(venv) PS C:\Users\localadmin> python -m ue4docker info
ue4-docker version:         0.0.34 (latest available version is 0.0.35)
Operating system:           Windows Server 2019 Standard Version 1809 (OS Build 17763.557)
Docker daemon version:      18.09.6
NVIDIA Docker supported:    No
Maximum image size:         200GB
Available disk space:       676.86 GiB
Total system memory:        63.89 GiB physical, 73.39 GiB virtual
Number of processors:       20 physical, 40 logical
(venv) PS C:\Users\localadmin> docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 49
Server Version: 18.09.6
Storage Driver: windowsfilter
 Windows:
Logging Driver: json-file
Plugins:
 Volume: local
 Network: ics l2bridge l2tunnel nat null overlay transparent
 Log: awslogs etwlogs fluentd gelf json-file local logentries splunk syslog
Swarm: inactive
Default Isolation: process
Kernel Version: 10.0 17763 (17763.1.amd64fre.rs5_release.180914-1434)
Operating System: Windows Server 2019 Standard Version 1809 (OS Build 17763.557)
OSType: windows
Architecture: x86_64
CPUs: 40
Total Memory: 63.89GiB
Name: au1-bldwin-22
ID: IN3O:ZULX:EYEV:VGNW:EE33:WOFD:FUF3:NRTO:I2NK:6WEP:CPAL:HSPL
Docker Root Dir: C:\ProgramData\docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
@TBBle
Copy link
Collaborator Author

TBBle commented Jun 19, 2019

On examination, 200gB is probably simply not enough:

(venv) PS C:\Users\localadmin> docker image ls
REPOSITORY                                           TAG                                IMAGE ID            CREATED             SIZE
<none>                                               <none>                             d6d0454b25e0        12 hours ago        196GB
ue4-docker.artifactory.bigworldtech.com/ue4-source   4.22.2                             ec9a2db4fcbf        15 hours ago        66.5GB
ue4-docker.artifactory.bigworldtech.com/ue4-source   4.22.2-ltsc2019                    ec9a2db4fcbf        15 hours ago        66.5GB
adamrehn/ue4-build-prerequisites                     ltsc2019                           18a01ab8dea2        16 hours ago        24.5GB
<none>                                               <none>                             c749f1ae825b        16 hours ago        9.97GB
mcr.microsoft.com/dotnet/framework/sdk               4.7.2-windowsservercore-ltsc2019   d84a79dbfc8c        8 days ago          8.55GB

d6d0454b25e0 is the final state of the 'builder' image:

Step 14/22 : RUN C:\copy.py C:\legacy-toolchain-fix.py C:\UnrealEngine\LocalBuilds\Engine\Windows\ && echo. && echo.RUN directive complete. Docker will now commit the filesystem layer to disk. && echo.Note that for large filesystem layers this can take quite some time. && echo.Performing filesystem layer commit... && echo.
 ---> Using cache
 ---> d6d0454b25e0

So it's possible that COPY --from requires the "image size limit" to be large enough to hold both input and output.

The command I'm using is

python -m ue4docker build 4.22.2 --no-engine

and it's happily resuming at the problematic COPY step.

I've tried changing the daemon.json to have a 400gB limit, and executing Stop-Service Docker + Start-Service Docker, but it still failed. I'm going to reboot the host, and try again. But a quick check suggests that the 400gB limit has taken effect:
docker run --rm -it mcr.microsoft.com/dotnet/framework/sdk:4.7.2-windowsservercore-ltsc2019 powershell:

PS C:\> Get-WmiObject -Class Win32_logicaldisk


DeviceID     : C:
DriveType    : 3
ProviderName :
FreeSpace    : 429185159168
Size         : 429361438720
VolumeName   :

One thing I did notice, is that ue4docker info saw the 400gB limit change before I restarted Docker. I didn't check, but it's not documented as a dynamic configuration parameter, so without looking I suspect ue4-docker could be fooled by a daemon.json change that hasn't had the requisite service restart.

@adamrehn
Copy link
Owner

Hmmm, that's a bit puzzling. The 8GB limit fix should definitely be included in those versions of Docker.

Regarding the daemon.json change: yes, ue4-docker will be fooled by this. To my great annoyance, there's no way to query the Docker daemon about those configuration settings, so my only option is to read the JSON file and assume that the Docker daemon was restarted after the last modification.

@TBBle
Copy link
Collaborator Author

TBBle commented Jun 20, 2019

To get the real image-size limit, perhaps the script could fire up a small, known image, and query the free disk space, as I did above.

After a reboot, the problem persists. I'll try and debug it some more later, but in the meantime, I'm going to try and put the Linux containers (which built fine) into CI first, as that's a larger pain-point in our CI system.

@TBBle
Copy link
Collaborator Author

TBBle commented Jun 20, 2019

I tried to create a simple reproduction of the "8gB problem", but didn't succeed. A null-byte file (fsutil file createNew) happily copied, and the random powershell scripts I grabbed ran out of RAM trying to create the file.

I tried switching from MS's 18.09.6 (from DockerMSFTProvider, installed above) to Docker's 18.09.6 (from DockerProvider), as the packages have different SHA256, but that didn't change anything.

I've installed 19.03.2-rc2 from DockerProvider just in case there's a problem that persists in the 18.09 series:

Install-Package -Name Docker -ProviderName DockerProvider -Update -RequiredVersion 19.03.0-rc2 -Force -Verbose

and am now rebuilding to see if that passes. Interestingly, step 6 (the first RUN in ue4-minimal) didn't pull from the cache after the Docker update, I wonder if something has changed upstream that would cause that to not match.

I actually had to bypass the SHA256 verification of the downloaded .zip of 19.03.0-rc2. I downloaded it from a couple of places, and always got the same SHA256, but it never matched the manifest SHA256.

@TBBle
Copy link
Collaborator Author

TBBle commented Jun 20, 2019

Issue reproduced with 19.03.0-rc2 as well.

@adamrehn
Copy link
Owner

Damn, that's a bit worrying, the 8GB fix should well and truly be integrated in versions of Docker that new. Out of curiosity, does the build succeed if you use the --exclude debug flag?

@TBBle
Copy link
Collaborator Author

TBBle commented Jun 21, 2019

I'll try --exclude debug next.

I've just been playing with debug logging for Docker (still 19.03.0-rc2), but it doesn't really tell me much:

time="2019-06-21T12:13:33.121345200+10:00" level=debug msg="hcsshim::PrepareLayer - succeeded" path="C:\\ProgramData\\docker\\windowsfilter\\34b18648e202d2677ba9642057ba054bc592736856932b1ffec936bb6a0c8977"
time="2019-06-21T12:13:33.121345200+10:00" level=debug msg="hcsshim::GetLayerMountPath" path="C:\\ProgramData\\docker\\windowsfilter\\34b18648e202d2677ba9642057ba054bc592736856932b1ffec936bb6a0c8977"
time="2019-06-21T12:13:33.121345200+10:00" level=debug msg="Calling proc (1)" path="C:\\ProgramData\\docker\\windowsfilter\\34b18648e202d2677ba9642057ba054bc592736856932b1ffec936bb6a0c8977"
time="2019-06-21T12:13:33.135897100+10:00" level=debug msg="Calling proc (2)" path="C:\\ProgramData\\docker\\windowsfilter\\34b18648e202d2677ba9642057ba054bc592736856932b1ffec936bb6a0c8977"
time="2019-06-21T12:13:33.137928900+10:00" level=debug msg="hcsshim::GetLayerMountPath - succeeded" mountPath="\\\\?\\Volume{ef411550-933c-11e9-9d05-807b504cd9e9}" path="C:\\ProgramData\\docker\\windowsfilter\\34b18648e202d2677ba9642057ba054bc592736856932b1ffec936bb6a0c8977"
time="2019-06-21T12:13:33.138898100+10:00" level=debug msg="Calling TarUntar(\\\\?\\Volume{6c704b8a-0410-43d1-998e-de7fc99cdbbd}\\UnrealEngine\\LocalBuilds\\Engine\\Windows, \\\\?\\Volume{ef411550-933c-11e9-9d05-807b504cd9e9}\\UnrealEngine)"
time="2019-06-21T12:13:33.138898100+10:00" level=debug msg="TarUntar(\\\\?\\Volume{6c704b8a-0410-43d1-998e-de7fc99cdbbd}\\UnrealEngine\\LocalBuilds\\Engine\\Windows \\\\?\\Volume{ef411550-933c-11e9-9d05-807b504cd9e9}\\UnrealEngine)"
time="2019-06-21T12:14:52.491420100+10:00" level=error msg="Can't add file \\\\?\\Volume{6c704b8a-0410-43d1-998e-de7fc99cdbbd}\\UnrealEngine\\LocalBuilds\\Engine\\Windows\\Engine\\Plugins\\MovieScene\\ActorSequence\\Binaries\\Win64\\UE4Editor-ActorSequence.pdb to tar: io: read/write on closed pipe"
time="2019-06-21T12:14:52.491420100+10:00" level=error msg="Can't close tar writer: io: read/write on closed pipe"
time="2019-06-21T12:14:52.491420100+10:00" level=debug msg="WindowsGraphDriver Put() id 34b18648e202d2677ba9642057ba054bc592736856932b1ffec936bb6a0c8977"
time="2019-06-21T12:14:52.491420100+10:00" level=debug msg="hcsshim::UnprepareLayer" path="C:\\ProgramData\\docker\\windowsfilter\\34b18648e202d2677ba9642057ba054bc592736856932b1ffec936bb6a0c8977"
time="2019-06-21T12:14:52.496420000+10:00" level=debug msg="hcsshim::UnprepareLayer - succeeded" path="C:\\ProgramData\\docker\\windowsfilter\\34b18648e202d2677ba9642057ba054bc592736856932b1ffec936bb6a0c8977"
time="2019-06-21T12:14:52.496420000+10:00" level=debug msg="hcsshim::DeactivateLayer" path="C:\\ProgramData\\docker\\windowsfilter\\34b18648e202d2677ba9642057ba054bc592736856932b1ffec936bb6a0c8977"
time="2019-06-21T12:15:00.577704900+10:00" level=debug msg="hcsshim::DeactivateLayer - succeeded" path="C:\\ProgramData\\docker\\windowsfilter\\34b18648e202d2677ba9642057ba054bc592736856932b1ffec936bb6a0c8977"

I'm not sure if the 8gB problem gives those logs. I was working with an abbreviated Dockerfile which started from the final state of the build container before the second FROM:

docker image tag 40ea5414bef9 thebuilder:4.22.2-ltsc2019
FROM thebuilder:4.22.2-ltsc2019 AS builder
FROM adamrehn/ue4-build-prerequisites:ltsc2019
COPY --from=builder /UnrealEngine/LocalBuilds/Engine/Windows /UnrealEngine/

I also confirmed that docker cp works, which I think also relies on the tar system:

mkdir copydata
docker create thebuilder:4.22.2-ltsc2019
# 8517e8bcc6764b25f0eadf7cfdee2f7506f69701f8dae1a3647112e7e6ce2f62
docker cp 8517e8bcc676:/UnrealEngine/LocalBuilds/Engine/Windows .\copydata\

@TBBle
Copy link
Collaborator Author

TBBle commented Jun 21, 2019

I can confirm that with --exclude debug, it passes this step fine.

If I remember correctly, the last set of images I built were under a version of ue4-docker where the PDBs were automatically excluded, which was under an AWS-hosted by otherwise similar build environment (ltsc2019), for 4.21.1. So I'm wondering if somehow this problem is exclusive to Docker EE, and the people who've fixed #37 by upgrading to 18.09.6 were actually unintentionally switching to Docker CE or something.

If I find some time, I might try and finish my effort to put together a small repro case, as all the repro-cases I found for this issue were for Linux containers. Then I could also check it against Docker CE easily.

@adamrehn
Copy link
Owner

adamrehn commented Jun 21, 2019

Okay, so it's definitely a size issue then if the version with truncated debug symbols works.

To the best of my knowledge, it's not possible to install Docker CE under Windows Server (unless using Docker Desktop under a version of Windows Server with the Desktop Experience), but I'd be genuinely interested to see if Docker CE 19.03.0-beta3 suffers from the same issue when running the edge release of Docker Desktop under Windows 10. (The latest stable release of Docker Desktop currently includes Docker CE 18.09.2, which definitely does suffer from this issue.)

A reliable repro Dockerfile for Windows containers would be very useful, since we could then systematically test various versions of both Docker CE and Docker EE to determine exactly which release(s) do and do not include the 8GB limit fix.

@adamrehn
Copy link
Owner

Although I know you were encountering this problem under Windows Server 1809, it's interesting to note that a related bug has just been positively identified in Windows Server 1903: docker/for-win#3884 (comment)

@TBBle
Copy link
Collaborator Author

TBBle commented Aug 6, 2019

Docker Enterprise for Windows Server (DockerMsftProvider) now has a release of 19.03.01, so sometime in the next couple of weeks I expect to be trying this out again with UE 4.22, same as the previous attempt, but slightly patched this time for Server and Client builds.

@adamrehn
Copy link
Owner

adamrehn commented Aug 7, 2019

Yeah, I noticed yesterday that the latest version of Docker Desktop now includes 19.03.1, so it'll be interesting to see how it goes. I might actually spin up a Windows Server 1809 VM and test it out too.

@adamrehn
Copy link
Owner

@TBBle I've just added functionality to ue4-docker that allows users to test for the presence of the 8GiB filesystem layer bug: #63 (comment)

@TBBle
Copy link
Collaborator Author

TBBle commented Feb 1, 2020

Awesome. I gave the new version a go at home (as a baseline) on Docker Desktop 2.2.0.0 (44247)

I'll have a run on the system at work about which I originally logged this ticket. It's probably due to have its Docker installation upgraded anyway.

Linux containers (VM) mode

(ue4ven) PS C:\Users\paulh> docker version
Client: Docker Engine - Community
 Version:           19.03.5
 API version:       1.40
 Go version:        go1.12.12
 Git commit:        633a0ea
 Built:             Wed Nov 13 07:22:37 2019
 OS/Arch:           windows/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.5
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.12
  Git commit:       633a0ea
  Built:            Wed Nov 13 07:29:19 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.2.10
  GitCommit:        b34a5c8af56e510852c35414db4c1f4fa6172339
 runc:
  Version:          1.0.0-rc8+dev
  GitCommit:        3e425f80a8c931f88e6d94a8c831b9d5aa481657
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
(ue4ven) PS C:\Users\paulh> ue4-docker info
Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Program Files\Python36\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\paulh\ue4ven\Scripts\ue4-docker.exe\__main__.py", line 7, in <module>
  File "c:\users\paulh\ue4ven\lib\site-packages\ue4docker\main.py", line 81, in main
    COMMANDS[command]['function']()
  File "c:\users\paulh\ue4ven\lib\site-packages\ue4docker\info.py", line 30, in info
    diskSpace = shutil.disk_usage(rootDir).free
  File "C:\Program Files\Python36\lib\shutil.py", line 1012, in disk_usage
    total, free = nt._getdiskusage(path)
FileNotFoundError: [WinError 3] The system cannot find the path specified

I think it's trying to _getdiskusage on /var/lib/docker. Not a supported build-mode, as I recall.

(ue4ven) PS C:\Users\paulh> docker info
Client:
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 20
 Server Version: 19.03.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
 runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.19.76-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 1.943GiB
 Name: docker-desktop
 ID: QWUO:X5PT:RPSC:XCQ2:7XQG:TRYH:L727:44LX:KBV4:RFSE:ODXD:G4ZW
 Docker Root Dir: /var/lib/docker
 Debug Mode: true
  File Descriptors: 34
  Goroutines: 51
  System Time: 2020-02-01T09:35:39.95664034Z
  EventsListeners: 3
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine
(ue4ven) PS C:\Users\paulh> ue4-docker diagnostics 8gig --linux
[ue4-docker diagnostics] Running diagnostic: "Check for Docker 8GiB filesystem layer bug"
[ue4-docker diagnostics] [8gig] Attempting to build an image with an 8GiB filesystem layer...
[ue4-docker diagnostics] [8gig] Run: ['docker', 'build', '-t', 'adamrehn/ue4-docker/diagnostics:8gig', 'c:\\users\\paulh\\ue4ven\\lib\\site-packages\\ue4docker\\dockerfiles\\diagnostics\\8gig\\linux']
Sending build context to Docker daemon  2.048kB
Step 1/3 : FROM alpine:latest
latest: Pulling from library/alpine
c9b1b535fdd9: Pull complete
Digest: sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d
Status: Downloaded newer image for alpine:latest
 ---> e7d92cdc71fe
Step 2/3 : LABEL com.adamrehn.ue4-docker.sentinel="1"
 ---> Running in bc7a932c17a9
Removing intermediate container bc7a932c17a9
 ---> ad536be5e0d5
Step 3/3 : RUN head -c 8589934592 </dev/urandom >file
 ---> Running in d03a798cefb6
Removing intermediate container d03a798cefb6
 ---> e77f844636a8
Successfully built e77f844636a8
Successfully tagged adamrehn/ue4-docker/diagnostics:8gig
SECURITY WARNING: You are building a Docker image from Windows against a non-Windows Docker host. All files and directories added to build context will have '-rwxr-xr-x' permissions. It is recommended to double check and reset permissions for sensitive files and directories.
[ue4-docker diagnostics] [8gig] Cleaning up...
[ue4-docker diagnostics] [8gig] Run: ['docker', 'rmi', 'adamrehn/ue4-docker/diagnostics:8gig']
Untagged: adamrehn/ue4-docker/diagnostics:8gig
Deleted: sha256:e77f844636a8e3777bdaa849721f450f0b150483aaff4b462c0faaf0c396728b
Deleted: sha256:600032b17d0dd909363f4c3f8245a6f4e714cd3f3b142bf2bcc8cad2c407efe0
Deleted: sha256:ad536be5e0d52b04959d8252851d138cc2f1419f02bdda5530ff645d01c365a4
[ue4-docker diagnostics] [8gig] Run: ['docker', 'system', 'prune', '-f']
Total reclaimed space: 0B

[ue4-docker diagnostics] [8gig] Diagnostic succeeded! The Docker daemon can build images with 8GiB filesystem layers.

[ue4-docker diagnostics] Diagnostic result: passed

Annoyingly, without --linux it doesn't work as it tries to use a Windows base image. It'd be nice if it could detect and tell me which platform the Docker daemon was configured for.

Windows Containers mode:

(ue4ven) PS C:\Users\paulh> docker version
Client: Docker Engine - Community
 Version:           19.03.5
 API version:       1.40
 Go version:        go1.12.12
 Git commit:        633a0ea
 Built:             Wed Nov 13 07:22:37 2019
 OS/Arch:           windows/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.5
  API version:      1.40 (minimum version 1.24)
  Go version:       go1.12.12
  Git commit:       633a0ea
  Built:            Wed Nov 13 07:36:50 2019
  OS/Arch:          windows/amd64
  Experimental:     false
(ue4ven) PS C:\Users\paulh> ue4-docker info
ue4-docker version:         0.0.47 (latest available version is 0.0.47)
Operating system:           Windows 10 Pro Version 1909 (OS Build 18362.592)
Docker daemon version:      19.03.5
NVIDIA Docker supported:    No
Maximum image size:         20GB
Available disk space:       17.74 GiB
Total system memory:        15.96 GiB physical, 22.52 GiB virtual
Number of processors:       4 physical, 8 logical
(ue4ven) PS C:\Users\paulh> docker info
Client:
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 19.03.5
 Storage Driver: windowsfilter
  Windows:
 Logging Driver: json-file
 Plugins:
  Volume: local
  Network: ics internal l2bridge l2tunnel nat null overlay private transparent
  Log: awslogs etwlogs fluentd gcplogs gelf json-file local logentries splunk syslog
 Swarm: inactive
 Default Isolation: hyperv
 Kernel Version: 10.0 18363 (18362.1.amd64fre.19h1_release.190318-1202)
 Operating System: Windows 10 Pro Version 1909 (OS Build 18363.592)
 OSType: windows
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.96GiB
 Name: KEITARO
 ID: NDQJ:UAVJ:AHVY:OAXE:FYZU:4JGK:FV2M:PCET:ZEBX:AGSH:7L7H:GHCF
 Docker Root Dir: C:\ProgramData\Docker
 Debug Mode: true
  File Descriptors: -1
  Goroutines: 43
  System Time: 2020-02-01T20:08:49.7631458+11:00
  EventsListeners: 1
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine
(ue4ven) PS C:\Users\paulh> ue4-docker diagnostics 8gig
[ue4-docker diagnostics] Running diagnostic: "Check for Docker 8GiB filesystem layer bug"
[ue4-docker diagnostics] [8gig] Attempting to build an image with an 8GiB filesystem layer...
[ue4-docker diagnostics] [8gig] Run: ['docker', 'build', '-t', 'adamrehn/ue4-docker/diagnostics:8gig', 'c:\\users\\paulh\\ue4ven\\lib\\site-packages\\ue4docker\\dockerfiles\\diagnostics\\8gig\\windows', '-m', '4GiB', '--build-arg', 'BASETAG=ltsc2016']
Sending build context to Docker daemon   2.56kB
Step 1/5 : ARG BASETAG
Step 2/5 : FROM mcr.microsoft.com/windows/servercore:${BASETAG}
ltsc2016: Pulling from windows/servercore
3889bb8d808b: Pull complete
31f9df80631e: Pull complete
Digest: sha256:42be24b8810c861cc1b3fe75c5e99f75061cb45fdbae1de46d151c18cc8e6a9a
Status: Downloaded newer image for mcr.microsoft.com/windows/servercore:ltsc2016
 ---> 2854dfb4841f
Step 3/5 : SHELL ["cmd", "/S", "/C"]
 ---> Running in 5e9462c4c19e
Removing intermediate container 5e9462c4c19e
 ---> fbf424a9d9f3
Step 4/5 : LABEL com.adamrehn.ue4-docker.sentinel="1"
 ---> Running in e01dbd542e03
Removing intermediate container e01dbd542e03
 ---> f2822171a5a8
Step 5/5 : RUN powershell "     $ErrorActionPreference = 'Stop';        $writer = [System.IO.File]::OpenWrite('file');            $random = new-object Random;    $blockSize = 1073741824;        $bytes = new-object byte[] $blockSize;    for ($i=0; $i -lt 8; $i++)      {               $random.NextBytes($bytes);              $writer.Write($bytes, 0, $blockSize);     }               $writer.Close();        "
 ---> Running in d6e9451fd7dd
re-exec error: exit status 1: output: time="2020-02-01T20:30:10+11:00" level=error msg="hcsshim::ImportLayer - failed failed in Win32: The system cannot find the path specified. (0x3)" error="hcsshim::ImportLayer - failed failed in Win32: The system cannot find the path specified. (0x3)" importFolderPath="C:\\ProgramData\\Docker\\tmp\\hcs088029711" path="\\\\?\\C:\\ProgramData\\Docker\\windowsfilter\\bd081207bb8a5fac52aa5639508623d364e0843205d0512056f878571fd66689"
archive/tar: invalid tar header
[ue4-docker diagnostics] [8gig] Build failed!
[ue4-docker diagnostics] [8gig] Cleaning up...
[ue4-docker diagnostics] [8gig] Run: ['docker', 'system', 'prune', '-f']
Deleted Containers:
d6e9451fd7dd7ae5398dfdbf4e933052fb0712838e3b81d83e8d50af966becc2

Deleted Images:
deleted: sha256:f2822171a5a80c5b140ed19d5ab8654540c7f66d3ffb1df9b2c5c5e4083daa28
deleted: sha256:349b88837028c664ae1c28e75a7f98696cc94418ec8d4de063693186241bff2f
deleted: sha256:fbf424a9d9f31ef8792bf22c94dbe06e002dd367da84024e611357c1e54be1ed
deleted: sha256:145639006e0ec96b8935f5fc73aafd49c2488e3f1a6876b12dd9e0141ede4b8c

Total reclaimed space: 81.92kB

[ue4-docker diagnostics] [8gig] Diagnostic failed! The Docker daemon cannot build images with 8GiB filesystem layers.

[ue4-docker diagnostics] Diagnostic result: failed

Same results so far as described in #63 (comment).

I turned on 'experimental mode' to get access to LCOW

(ue4ven) PS C:\Users\paulh> docker info
Client:
 Debug Mode: false

Server:
 Containers: 1
  Running: 0
  Paused: 0
  Stopped: 1
 Images: 3
 Server Version: 19.03.5
 Storage Driver: windowsfilter (windows) lcow (linux)
  Windows:
  LCOW:
 Logging Driver: json-file
 Plugins:
  Volume: local
  Network: ics internal l2bridge l2tunnel nat null overlay private transparent
  Log: awslogs etwlogs fluentd gcplogs gelf json-file local logentries splunk syslog
 Swarm: inactive
 Default Isolation: hyperv
 Kernel Version: 10.0 18363 (18362.1.amd64fre.19h1_release.190318-1202)
 Operating System: Windows 10 Pro Version 1909 (OS Build 18363.592)
 OSType: windows
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.96GiB
 Name: KEITARO
 ID: NDQJ:UAVJ:AHVY:OAXE:FYZU:4JGK:FV2M:PCET:ZEBX:AGSH:7L7H:GHCF
 Docker Root Dir: C:\ProgramData\Docker
 Debug Mode: true
  File Descriptors: -1
  Goroutines: 28
  System Time: 2020-02-01T20:38:25.1618532+11:00
  EventsListeners: 1
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine
(ue4ven) PS C:\Users\paulh> docker version
Client: Docker Engine - Community
 Version:           19.03.5
 API version:       1.40
 Go version:        go1.12.12
 Git commit:        633a0ea
 Built:             Wed Nov 13 07:22:37 2019
 OS/Arch:           windows/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.5
  API version:      1.40 (minimum version 1.24)
  Go version:       go1.12.12
  Git commit:       633a0ea
  Built:            Wed Nov 13 07:36:50 2019
  OS/Arch:          windows/amd64
  Experimental:     true
(ue4ven) PS C:\Users\paulh> ue4-docker info
ue4-docker version:         0.0.47 (latest available version is 0.0.47)
Operating system:           Windows 10 Pro Version 1909 (OS Build 18362.592)
Docker daemon version:      19.03.5
NVIDIA Docker supported:    No
Maximum image size:         20GB
Available disk space:       19.04 GiB
Total system memory:        15.96 GiB physical, 22.52 GiB virtual
Number of processors:       4 physical, 8 logical
(ue4ven) PS C:\Users\paulh> docker info
Client:
 Debug Mode: false

Server:
 Containers: 1
  Running: 0
  Paused: 0
  Stopped: 1
 Images: 3
 Server Version: 19.03.5
 Storage Driver: windowsfilter (windows) lcow (linux)
  Windows:
  LCOW:
 Logging Driver: json-file
 Plugins:
  Volume: local
  Network: ics internal l2bridge l2tunnel nat null overlay private transparent
  Log: awslogs etwlogs fluentd gcplogs gelf json-file local logentries splunk syslog
 Swarm: inactive
 Default Isolation: hyperv
 Kernel Version: 10.0 18363 (18362.1.amd64fre.19h1_release.190318-1202)
 Operating System: Windows 10 Pro Version 1909 (OS Build 18363.592)
 OSType: windows
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.96GiB
 Name: KEITARO
 ID: NDQJ:UAVJ:AHVY:OAXE:FYZU:4JGK:FV2M:PCET:ZEBX:AGSH:7L7H:GHCF
 Docker Root Dir: C:\ProgramData\Docker
 Debug Mode: true
  File Descriptors: -1
  Goroutines: 27
  System Time: 2020-02-01T20:39:19.9245676+11:00
  EventsListeners: 1
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine
(ue4ven) PS C:\Users\paulh> ue4-docker diagnostics 8gig --linux
[ue4-docker diagnostics] Running diagnostic: "Check for Docker 8GiB filesystem layer bug"
[ue4-docker diagnostics] [8gig] Attempting to build an image with an 8GiB filesystem layer...
[ue4-docker diagnostics] [8gig] Run: ['docker', 'build', '-t', 'adamrehn/ue4-docker/diagnostics:8gig', 'c:\\users\\paulh\\ue4ven\\lib\\site-packages\\ue4docker\\dockerfiles\\diagnostics\\8gig\\linux']
Sending build context to Docker daemon  2.048kB
Step 1/3 : FROM alpine:latest
latest: Pulling from library/alpine
c9b1b535fdd9: Pull complete
Digest: sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d
Status: Downloaded newer image for alpine:latest
 ---> e7d92cdc71fe
Step 2/3 : LABEL com.adamrehn.ue4-docker.sentinel="1"
 ---> Running in 1e973c898214
Removing intermediate container 1e973c898214
 ---> 84643834c3ab
Step 3/3 : RUN head -c 8589934592 </dev/urandom >file
 ---> Running in 81f0574804ec
Removing intermediate container 81f0574804ec
 ---> 81c64b7a2178
Successfully built 81c64b7a2178
Successfully tagged adamrehn/ue4-docker/diagnostics:8gig
[ue4-docker diagnostics] [8gig] Cleaning up...
[ue4-docker diagnostics] [8gig] Run: ['docker', 'rmi', 'adamrehn/ue4-docker/diagnostics:8gig']
Untagged: adamrehn/ue4-docker/diagnostics:8gig
Deleted: sha256:81c64b7a2178f26f2251cc4e3ba356b044636a35e784e78884c6f6033c294f71
Deleted: sha256:ed05913815355fd4b0b297926f193f1eec122ea3f45060570bb14de3486f9190
Deleted: sha256:84643834c3ab651a34898ddb509c8950c91195f7f4f10b44bdf05882ed843d0e
Deleted: sha256:c687fa4c1eea98427827eab76d7b0c286199d4b509d5685a63a043b9af24f49a
[ue4-docker diagnostics] [8gig] Run: ['docker', 'system', 'prune', '-f']
Deleted Images:
deleted: sha256:560d06c1e808895d99f9470579267d4c80b82bab247f7a22240cb2f849ad03d2
deleted: sha256:018709705c2941be92de4baa3686cafa39063c83ed48577f5ca96993324ca7b0
deleted: sha256:4d983543f151f91c8d4445c662c59d244fc689f401178bb3652570213ad2cc61
deleted: sha256:ed5434e08befc8692cc794f155ed114cd6a00a7c233eb0faa88f41e79c0e176e

Total reclaimed space: 81.92kB

[ue4-docker diagnostics] [8gig] Diagnostic succeeded! The Docker daemon can build images with 8GiB filesystem layers.

[ue4-docker diagnostics] Diagnostic result: passed

So yeah, LCOW mode passed, and (if I understand correctly) that's the same daemon as WCOW, so it's not a missing daemon patch in the Windows build, as surmised.

Without experimental mode enabled in the daemon, --linux failed similarly to how lack of --linux failed in Linux Containers mode, with a complaint that the requested base image did not have a matching platform. I'm not sure if this is usefully detectable... Perhaps there's some reliable combination of server OS/arch and available storage drivers that can report the right failure earlier.

(ue4ven) PS C:\Users\paulh> docker build -t adamrehn/ue4-docker/diagnostics:8gig C:\Users\paulh\ue4ven\Lib\site-packages\ue4docker\dockerfiles\diagnostics\8gig\windows\ -m 4GiB --build-arg BASETAG=10.0.18363.592 --isolation process
Sending build context to Docker daemon   2.56kB
Step 1/5 : ARG BASETAG
Step 2/5 : FROM mcr.microsoft.com/windows/servercore:${BASETAG}
 ---> afef2165d6c5
Step 3/5 : SHELL ["cmd", "/S", "/C"]
 ---> Running in 75db7fd8ca9a
Removing intermediate container 75db7fd8ca9a
 ---> 70a994d12286
Step 4/5 : LABEL com.adamrehn.ue4-docker.sentinel="1"
 ---> Running in 34e593283617
Removing intermediate container 34e593283617
 ---> 4f91e1bffdef
Step 5/5 : RUN powershell "     $ErrorActionPreference = 'Stop';        $writer = [System.IO.File]::OpenWrite('file');            $random = new-object Random;    $blockSize = 1073741824;        $bytes = new-object byte[] $blockSize;    for ($i=0; $i -lt 8; $i++)      {               $random.NextBytes($bytes);              $writer.Write($bytes, 0, $blockSize);     }               $writer.Close();        "
 ---> Running in 77c262fb1c46
re-exec error: exit status 1: output: time="2020-02-01T23:30:10+11:00" level=error msg="hcsshim::ImportLayer - failed failed in Win32: The system cannot find the path specified. (0x3)" error="hcsshim::ImportLayer - failed failed in Win32: The system cannot find the path specified. (0x3)" importFolderPath="C:\\ProgramData\\Docker\\tmp\\hcs569560335" path="\\\\?\\C:\\ProgramData\\Docker\\windowsfilter\\48ac85382acc63faab5986e2026ca5e7ec8fd5b39485457f91bbced3189ebb3d"
archive/tar: invalid tar header

Thought I'd give process isolation mode a chance while I was in Windows Containers mode. I had to do it by hand, because ue4docker sees the kernel version (via platform.win_ver() => sys.getwindowsversion().platform_version rather than the OS version sys.getwindowsversion().build, and the kernel is shared between 1903 and 1909, but MS provides distinct container bases. I'm not sure if it matters.

Anyway, looks like docker/for-win#3884 affects Windows 1909 as well.

(ue4ven) PS C:\Users\paulh> ue4-docker info
ue4-docker version:         0.0.47 (latest available version is 0.0.47)                                           Operating system:           Windows 10 Pro Version 1909 (OS Build 18362.592)
Docker daemon version:      19.03.5
NVIDIA Docker supported:    No
Maximum image size:         120GB
Available disk space:       13.94 GiB
Total system memory:        15.96 GiB physical, 22.52 GiB virtual
Number of processors:       4 physical, 8 logical
(ue4ven) PS C:\Users\paulh> docker build -t adamrehn/ue4-docker/diagnostics:8gig C:\Users\paulh\ue4ven\Lib\site-packages\ue4docker\dockerfiles\diagnostics\8gig\windows\ -m 4GiB --build-arg BASETAG=1909 --isolation process       Sending build context to Docker daemon   2.56kB
Step 1/5 : ARG BASETAG
Step 2/5 : FROM mcr.microsoft.com/windows/servercore:${BASETAG}
 ---> 53e65eb2181a
Step 3/5 : SHELL ["cmd", "/S", "/C"]
 ---> Using cache
 ---> 7df9e1fcf2ff
Step 4/5 : LABEL com.adamrehn.ue4-docker.sentinel="1"
 ---> Using cache
 ---> 6f2ef5a9de49
Step 5/5 : RUN powershell "     $ErrorActionPreference = 'Stop';        $writer = [System.IO.File]::OpenWrite('file');            $random = new-object Random;    $blockSize = 1073741824;        $bytes = new-object byte[] $blockSize;    for ($i=0; $i -lt 8; $i++)      {               $random.NextBytes($bytes);              $writer.Write($bytes, 0, $blockSize);     }               $writer.Close();        "
 ---> Running in 026f80aaacce
hcsshim::PrepareLayer - failed failed in Win32: The parameter is incorrect. (0x57)

Also, that issue got hijacked by discussions of a different error, caused by filesystem filters. Probably worth changing the link in the docs to docker/for-win#4100, where it's clearly visible that there'll be a workaround for this in Docker Engine 19.03.6. A ue4-docker diagnostic for this might help too.

Edit: Opened UnrealContainers/unrealcontainers.github.io#2 for the doc updates in that last paragraph.

@TBBle
Copy link
Collaborator Author

TBBle commented Feb 1, 2020

I captured the relevant log snippet in case I or someone goes looking into this later, with "log-level debug" and "debug true" configured, and the process-isolation command (it's faster) from the end of my last comment.

[01:14:15.104][WindowsDaemon     ][Info   ] debug: hcsshim::CreateLayer [parent=e8d6a05826844486fd3482aa96a93e28177303089a86ca16cdf93336e0aae2d1 path=C:\ProgramData\Docker\windowsfilter\8476082b202adcb25cb0883c02c68727e0b8078e429e2d1276bebd89883d78fb]
[01:14:15.104][WindowsDaemon     ][Info   ] debug: hcsshim::CreateLayer - succeeded [parent=e8d6a05826844486fd3482aa96a93e28177303089a86ca16cdf93336e0aae2d1 path=C:\ProgramData\Docker\windowsfilter\8476082b202adcb25cb0883c02c68727e0b8078e429e2d1276bebd89883d78fb]
[01:14:15.106][WindowsDaemon     ][Info   ] debug: hcsshim::GetLayerMountPath [path=C:\ProgramData\Docker\windowsfilter\e8d6a05826844486fd3482aa96a93e28177303089a86ca16cdf93336e0aae2d1]
[01:14:15.106][WindowsDaemon     ][Info   ] debug: Calling proc (1) [path=C:\ProgramData\Docker\windowsfilter\e8d6a05826844486fd3482aa96a93e28177303089a86ca16cdf93336e0aae2d1]
[01:14:15.107][WindowsDaemon     ][Info   ] debug: Calling proc (2) [path=C:\ProgramData\Docker\windowsfilter\e8d6a05826844486fd3482aa96a93e28177303089a86ca16cdf93336e0aae2d1]
[01:14:15.107][WindowsDaemon     ][Info   ] debug: hcsshim::GetLayerMountPath - succeeded [path=C:\ProgramData\Docker\windowsfilter\e8d6a05826844486fd3482aa96a93e28177303089a86ca16cdf93336e0aae2d1 mountPath=C:\ProgramData\Docker\windowsfilter\e8d6a05826844486fd3482aa96a93e28177303089a86ca16cdf93336e0aae2d1]
[01:14:33.772][WindowsDaemon     ][Info   ] debug: hcsshim::ExportLayer - succeeded [exportFolderPath=C:\ProgramData\Docker\tmp\hcs681066985 path=C:\ProgramData\Docker\windowsfilter\9d63ed708ed9804277eaa81ae69eb1ae6c5cb70889e78c661e2f3bf6d2849d02]
[01:15:31.154][WindowsDaemon     ][Info   ] debug: Cleaning up layer 8476082b202adcb25cb0883c02c68727e0b8078e429e2d1276bebd89883d78fb: re-exec error: exit status 1: output: time="2020-02-02T01:14:35+11:00" level=error msg="hcsshim::ImportLayer - failed failed in Win32: The system cannot find the path specified. (0x3)" error="hcsshim::ImportLayer - failed failed in Win32: The system cannot find the path specified. (0x3)" importFolderPath="C:\\ProgramData\\Docker\\tmp\\hcs709244971" path="\\\\?\\C:\\ProgramData\\Docker\\windowsfilter\\8476082b202adcb25cb0883c02c68727e0b8078e429e2d1276bebd89883d78fb"
archive/tar: invalid tar header

The parent layer's own logs:

[01:12:50.962][WindowsDaemon     ][Info   ] debug: hcsshim::CreateLayer [path=C:\ProgramData\Docker\windowsfilter\e8d6a05826844486fd3482aa96a93e28177303089a86ca16cdf93336e0aae2d1 parent=c1767da052ec2f95872ba406115c7d2607b6cf209bbe151d7054da5e7a076651]
[01:12:50.962][WindowsDaemon     ][Info   ] debug: hcsshim::CreateLayer - succeeded [parent=c1767da052ec2f95872ba406115c7d2607b6cf209bbe151d7054da5e7a076651 path=C:\ProgramData\Docker\windowsfilter\e8d6a05826844486fd3482aa96a93e28177303089a86ca16cdf93336e0aae2d1]
[01:12:50.963][WindowsDaemon     ][Info   ] debug: hcsshim::GetLayerMountPath [path=C:\ProgramData\Docker\windowsfilter\c1767da052ec2f95872ba406115c7d2607b6cf209bbe151d7054da5e7a076651]
[01:12:50.963][WindowsDaemon     ][Info   ] debug: Calling proc (1) [path=C:\ProgramData\Docker\windowsfilter\c1767da052ec2f95872ba406115c7d2607b6cf209bbe151d7054da5e7a076651]
[01:12:50.963][WindowsDaemon     ][Info   ] debug: Calling proc (2) [path=C:\ProgramData\Docker\windowsfilter\c1767da052ec2f95872ba406115c7d2607b6cf209bbe151d7054da5e7a076651]
[01:12:50.964][WindowsDaemon     ][Info   ] debug: hcsshim::GetLayerMountPath - succeeded [path=C:\ProgramData\Docker\windowsfilter\c1767da052ec2f95872ba406115c7d2607b6cf209bbe151d7054da5e7a076651 mountPath=C:\ProgramData\Docker\windowsfilter\c1767da052ec2f95872ba406115c7d2607b6cf209bbe151d7054da5e7a076651]
[01:12:50.984][WindowsDaemon     ][Info   ] debug: hcsshim::ExportLayer - succeeded [path=C:\ProgramData\Docker\windowsfilter\3c1ed5cb7d909f97957d3444b8886a71cf7771fa324b1d5822712b76aa0a034c exportFolderPath=C:\ProgramData\Docker\tmp\hcs880961252]
[01:12:51.107][WindowsDaemon     ][Info   ] debug: Applied tar sha256:c1eb2f256f82178f101a910b9d00818045c59cd8dfd09a669e0f64887e095f72 to e8d6a05826844486fd3482aa96a93e28177303089a86ca16cdf93336e0aae2d1, size: 40960

I disabled re-exec (by setting the DOCKER_WINDOWSFILTER_NOREEXEC system env-var to "true", and restarting Docker), and now we can see the logs from Importlayer:

[02:08:58.608][WindowsDaemon     ][Info   ] debug: hcsshim::CreateLayer [parent=3447d5628c0aa42a15ca00c3ee6e0694f2260d6b47248c319ce6e8cffd55631e path=C:\ProgramData\Docker\windowsfilter\3a21ae7c47a976fb2777a68c7f5fe69749620b8fdf4e31eddf7b68b182f11cf2]
[02:08:58.608][WindowsDaemon     ][Info   ] debug: hcsshim::CreateLayer - succeeded [parent=3447d5628c0aa42a15ca00c3ee6e0694f2260d6b47248c319ce6e8cffd55631e path=C:\ProgramData\Docker\windowsfilter\3a21ae7c47a976fb2777a68c7f5fe69749620b8fdf4e31eddf7b68b182f11cf2]
[02:08:58.610][WindowsDaemon     ][Info   ] debug: hcsshim::GetLayerMountPath [path=C:\ProgramData\Docker\windowsfilter\3447d5628c0aa42a15ca00c3ee6e0694f2260d6b47248c319ce6e8cffd55631e]
[02:08:58.610][WindowsDaemon     ][Info   ] debug: Calling proc (1) [path=C:\ProgramData\Docker\windowsfilter\3447d5628c0aa42a15ca00c3ee6e0694f2260d6b47248c319ce6e8cffd55631e]
[02:08:58.611][WindowsDaemon     ][Info   ] debug: Calling proc (2) [path=C:\ProgramData\Docker\windowsfilter\3447d5628c0aa42a15ca00c3ee6e0694f2260d6b47248c319ce6e8cffd55631e]
[02:08:58.611][WindowsDaemon     ][Info   ] debug: hcsshim::GetLayerMountPath - succeeded [path=C:\ProgramData\Docker\windowsfilter\3447d5628c0aa42a15ca00c3ee6e0694f2260d6b47248c319ce6e8cffd55631e mountPath=C:\ProgramData\Docker\windowsfilter\3447d5628c0aa42a15ca00c3ee6e0694f2260d6b47248c319ce6e8cffd55631e]
[02:09:15.447][WindowsDaemon     ][Info   ] debug: hcsshim::ExportLayer - succeeded [exportFolderPath=C:\ProgramData\Docker\tmp\hcs502896340 path=C:\ProgramData\Docker\windowsfilter\9eb13afb7d10ab561f057c795fdfcbffcbefdcc38a646a0b6a07f33830fcb269]
[02:09:15.724][WindowsDaemon     ][Info   ] debug: hcsshim::ImportLayer [path=\\?\C:\ProgramData\Docker\windowsfilter\3a21ae7c47a976fb2777a68c7f5fe69749620b8fdf4e31eddf7b68b182f11cf2 importFolderPath=C:\ProgramData\Docker\tmp\hcs557293350]
[02:09:15.725][WindowsDaemon     ][Info   ] debug: hcsshim::NameToGuid [name=3447d5628c0aa42a15ca00c3ee6e0694f2260d6b47248c319ce6e8cffd55631e]
[02:09:15.725][WindowsDaemon     ][Info   ] debug: hcsshim::NameToGuid - succeeded [name=3447d5628c0aa42a15ca00c3ee6e0694f2260d6b47248c319ce6e8cffd55631e guid=496c6153-5923-50f3-a679-811861c093f1]
[02:09:15.725][WindowsDaemon     ][Info   ] debug: hcsshim::NameToGuid [name=e2ad3f188fca18f9f9c2808aa1d4d74b6191b771a627fc29a7e08ea5ec5ae71a]
[02:09:15.726][WindowsDaemon     ][Info   ] debug: hcsshim::NameToGuid - succeeded [name=e2ad3f188fca18f9f9c2808aa1d4d74b6191b771a627fc29a7e08ea5ec5ae71a guid=14a81531-253c-5d41-b021-cf539c5c0a70]
[02:09:15.726][WindowsDaemon     ][Info   ] debug: hcsshim::NameToGuid [name=63b4872ecc21398f29e8e7e3d5ee2f6d53e1f82678c8d6e792cca88156f540bc]
[02:09:15.726][WindowsDaemon     ][Info   ] debug: hcsshim::NameToGuid - succeeded [name=63b4872ecc21398f29e8e7e3d5ee2f6d53e1f82678c8d6e792cca88156f540bc guid=c8bff610-6069-5be5-9e9d-bd72bdce2266]
[02:09:15.726][WindowsDaemon     ][Info   ] debug: hcsshim::NameToGuid [name=a1f6b24f8ae0ee22dccca2c1f7c2a6937d60410f11c754060f0c090a78aedc79]
[02:09:15.726][WindowsDaemon     ][Info   ] debug: hcsshim::NameToGuid - succeeded [name=a1f6b24f8ae0ee22dccca2c1f7c2a6937d60410f11c754060f0c090a78aedc79 guid=758bc2a0-a01b-56b2-8ee0-d0767ce75ebe]
[02:09:16.847][WindowsDaemon     ][Error  ] hcsshim::ImportLayer - failed failed in Win32: The system cannot find the path specified. (0x3) [path=\\?\C:\ProgramData\Docker\windowsfilter\3a21ae7c47a976fb2777a68c7f5fe69749620b8fdf4e31eddf7b68b182f11cf2 importFolderPath=C:\ProgramData\Docker\tmp\hcs557293350 error=hcsshim::ImportLayer - failed failed in Win32: The system cannot find the path specified. (0x3)]
[02:10:13.429][WindowsDaemon     ][Info   ] debug: Cleaning up layer 3a21ae7c47a976fb2777a68c7f5fe69749620b8fdf4e31eddf7b68b182f11cf2: archive/tar: invalid tar header

I confirmed that the file in importFolderPath (C:\ProgramData\Docker\tmp\hcs557293350 here) exists during the run, and is 8GB, with the file named file present.

@TBBle
Copy link
Collaborator Author

TBBle commented Feb 1, 2020

Confirmed a slightly simpler test-case, which I'll post upstream.
Given an named 8gfile that is 8589934592 bytes long (i.e. I ran the PowerShell script from the diagnostics locally), and the following Dockerfile:

FROM mcr.microsoft.com/windows/nanoserver:1909
COPY 8gfile 8gfile

the problem reproduces.

@adamrehn
Copy link
Owner

adamrehn commented Feb 2, 2020

Thanks for the thorough testing and investigation! I've now added code to detect when Docker Desktop is in Linux container mode and provided more helpful error messages (in both ue4-docker build and the diagnostics), blacklisted Windows 1909, and added a diagnostic for testing for the storage-opt issue: ue4-docker diagnostics maxsize. The changes are live in ue4-docker version 0.0.48.

@adamrehn
Copy link
Owner

adamrehn commented Feb 2, 2020

It's a bit worrying that the diagnostic passes when running in LCOW mode, since as you've noted that suggests something a bit subtler than Windows builds of the daemon not including the relevant bugfix. Thanks for opening the issue on the Moby issue tracker, hopefully someone can identify what's actually going on there.

@adamrehn adamrehn added the awaiting-upstream Awaiting bugfixes or functionality from upstream dependencies label Feb 2, 2020
@TBBle
Copy link
Collaborator Author

TBBle commented Feb 2, 2020

No worries. I'm pretty sure (having spent some time with hcsshim and containerd recently while trying to get buildkit working for WCOW) that the problem is either an issue in system call hcsshim is making (ImportLayer), or something Docker is doing is wrong or out-of-order such that the prerequisites for ImportLayer are not correct.

For example, I haven't ruled out that Docker is somehow getting out of sync and cleaning up a directory that ImportLayer is still using.

I'm leaving it alone for now, hoping someone upstream can track this down, but if I get more time, I might try and work out if I can put together a repro using just hcsshim calls, and/or add a ton of debugging to Docker to try and track down all the things going on. I'm still having trouble working out exactly how Docker ends up calling ImportLayer anyway, the Go debugger won't let me breakpoint it for some reason, and I can't find an actual call-site by text but the logs clearly show it being called. >_<

I also need to finish some games so I can clear the disk space to keep experimenting... Or fix FROM scratch support for Windows in Docker.

@adamrehn
Copy link
Owner

adamrehn commented Feb 2, 2020

BuildKit support for Windows containers would be excellent, I'll be keen to see how that progresses. Combined with a fix for this problem, it'd finally provide enough parity between Linux and Windows containers to make some serious improvements to how ue4-docker builds images.

@TBBle
Copy link
Collaborator Author

TBBle commented Mar 11, 2020

So it's visible, the BuildKit tracking bug for Windows Container support is moby/buildkit#616

@tynril
Copy link

tynril commented Apr 29, 2020

Hi! While I understand that this is caused by an upstream issue, is there any known configuration of Windows/Docker versions that work around this issue?

I am currently hitting it with the following configuration:

C:\Users\Administrator>ue4-docker info
ue4-docker version:         0.0.62 (latest available version is 0.0.62)
Operating system:           Windows Server 2019 Datacenter Version 1809 (OS Build 17763.1158)
Docker daemon version:      18.09.7
NVIDIA Docker supported:    No
Maximum image size:         1000GB
Available disk space:       708.89 GiB
Total system memory:        31.21 GiB physical, 36.21 GiB virtual
Number of processors:       8 physical, 16 logical

The diagnostic run looks like that:

C:\Users\Administrator>ue4-docker diagnostics 8gig
[ue4-docker diagnostics] Running diagnostic: "Check for Docker 8GiB filesystem layer bug"
[ue4-docker diagnostics] [8gig] Attempting to build an image with an 8GiB filesystem layer...
[ue4-docker diagnostics] [8gig] Run: ['docker', 'build', '-t', 'adamrehn/ue4-docker/diagnostics:8gig', 'c:\\users\\administrator\\scoop\\apps\\python\\current\\lib\\site-packages\\ue4docker\\dockerfiles\\diagnostics\\8gig\\windows', '-m', '4GiB', '--build-arg', 'BASETAG=ltsc2019']
Sending build context to Docker daemon   2.56kB
Step 1/5 : ARG BASETAG
Step 2/5 : FROM mcr.microsoft.com/windows/servercore:${BASETAG}
 ---> fdf6432edbdc
Step 3/5 : SHELL ["cmd", "/S", "/C"]
 ---> Running in b1ff1aaa9943
Removing intermediate container b1ff1aaa9943
 ---> 351d8827c963
Step 4/5 : LABEL com.adamrehn.ue4-docker.sentinel="1"
 ---> Running in 90bca86fee20
Removing intermediate container 90bca86fee20
 ---> 1a5fad4323c9
Step 5/5 : RUN powershell "     $ErrorActionPreference = 'Stop';        $writer = [System.IO.File]::OpenWrite('file');
$random = new-object Random;    $blockSize = 1073741824;        $bytes = new-object byte[] $blockSize;  for ($i=0; $i -lt 8; $i++)      {               $random.NextBytes($bytes);              $writer.Write($bytes, 0, $blockSize);   }
$writer.Close();        "
 ---> Running in de2660e23b0c
re-exec error: exit status 1: output: time="2020-04-29T06:28:39Z" level=error msg="hcsshim::ImportLayer failed in Win32: The system cannot find the path specified. (0x3) path=\\\\?\\C:\\ProgramData\\docker\\windowsfilter\\396fd836265390f44b14160d0c0134c8c820cb09e3bf886f91feb271676590c2 folder=C:\\ProgramData\\docker\\tmp\\hcs260075835"
archive/tar: invalid tar header
[ue4-docker diagnostics] [8gig] Build failed!
[ue4-docker diagnostics] [8gig] Cleaning up...
[ue4-docker diagnostics] [8gig] Run: ['docker', 'system', 'prune', '-f']
Deleted Containers:
de2660e23b0cece69425cdb59fdba467831703ecece77c75008ffca891977a01

Deleted Networks:
vpcbr0237dab78110

Deleted Images:
deleted: sha256:1a5fad4323c907ed24d5e02a088c93dec5c2261d06dbb9411a1f334c18510141
deleted: sha256:d59717abddff1ac0b73f13c598056764e9cee31e3b594a663e1af63cf75f5265
deleted: sha256:351d8827c96328a9e3b6ac9ad4c95cb924046d550bd5682f1d502f0f9bf5c1d5
deleted: sha256:27c33450000e8377d22907c79f64e14925a466ae87f8a691f7651d69f51e6692

Total reclaimed space: 81.92kB

[ue4-docker diagnostics] [8gig] Diagnostic failed! The Docker daemon cannot build images with 8GiB filesystem layers.

[ue4-docker diagnostics] Diagnostic result: failed

If there is any configuration known to work at this point, I'd love to hear about it!

@adamrehn
Copy link
Owner

@tynril I'm not currently aware of any combination of Windows/Docker version that does not exhibit this behaviour, and judging by the comments on the upstream issue that @TBBle opened, it looks like it's still an unsolved problem. In the meantime, you can successfully build Windows container images by excluding debug symbols with the --exclude debug flag.

@TBBle
Copy link
Collaborator Author

TBBle commented Sep 25, 2020

The fix has gone into https://github.com/moby/moby/ (upstream for Docker Engine) and will be part of the 20.03.0 release, whenever that is.

@TBBle
Copy link
Collaborator Author

TBBle commented Oct 27, 2020

Thanks to @Agendum for the prompt, it appears that the Docker Engine 20.10.0 beta1 release included in Docker Desktop for Windows Community 2.4.2.0 (the Edge channel).

However, I haven't tested the 8gig diagnostic yet to confirm that.

@TBBle
Copy link
Collaborator Author

TBBle commented Oct 27, 2020

Per a discussion on #99, it's possible there's a different 8gig issue in Docker Engine still (relating to COPY a different target in a multi-target Dockerfile) which is going to hit in the same circumstances as this ticket, but might be a problem in Docker Engine itself, rather than in the underlying container support.

@TBBle
Copy link
Collaborator Author

TBBle commented Oct 28, 2020

I've confirmed that with Docker Desktop for Windows Community Edge (2.4.2.0), the ue4-docker diagnostics 8gig test passes. For some unrelated reason, process isolation is failing on both my test machines, so I used hyperv isolation.

This was using the updated version of the test from #97.

Test machines:

For ue4-docker diagnostics 8gig --isolation hyperv -basetag 2004:

PS> ue4-docker info
ue4-docker version:         0.0.71 (latest available version is 0.0.71)
Operating system:           Windows 10 Enterprise Version 2004 (OS Build 19041.508)
Docker daemon version:      20.10.0-beta1
NVIDIA Docker supported:    No
Maximum image size:         20GB
Available disk space:       458.54 GiB
Total system memory:        31.96 GiB physical, 63.96 GiB virtual
Number of processors:       4 physical, 8 logical

For ue4-docker diagnostics 8gig --isolation hyperv -basetag 1909:

PS> ue4-docker.exe info
ue4-docker version:         0.0.71 (latest available version is 0.0.71)
Operating system:           Windows 10 Enterprise Version 1909 (OS Build 18362.1082)
Docker daemon version:      20.10.0-beta1
NVIDIA Docker supported:    No
Maximum image size:         20GB
Available disk space:       14 GiB
Total system memory:        7.92 GiB physical, 19.81 GiB virtual
Number of processors:       2 physical, 4 logical

@adamrehn
Copy link
Owner

I can confirm that the 8gig diagnostic passes when using process isolation mode with Docker CE 20.10.0 under Windows 10 version 20H2. I'm going to close this issue now and we can continue to track discussions around the potential new bug in #99.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-upstream Awaiting bugfixes or functionality from upstream dependencies
Projects
None yet
Development

No branches or pull requests

3 participants