`diagnostics 8gig` speed and doc improvement #97

TBBle · 2020-09-10T09:19:54Z

This speeds up the test by using a sparse file on Windows (as suggested at moby/moby#41430 (comment)) and updates the in-script documentation to highlight that there were actually two different bugs, rather than being a bugfix which was partially ineffective.

I tested that the 8gig repro continues to take effect on a Windows 10 1909 machine with Docker Daemon 19.03.12, and subtracting 1 from the size causes the failure to not reproduce.

Both of the known 8gig issues are around the tarfile data stream, which isn't affected by the sparseness of the file being archived, as the OCI Layer specification forbids sparse files in the tar stream, so the file is treated naively.

@marosset

This is a lot faster (twice as fast in my testing) since file creation is instantaneous, so it's only the tar-stream process that processes all 8 gigabytes of data. Suggested by GitHub user @marosset while adding an integration test for this issue to Moby in moby/moby#41430 Signed-off-by: Paul "Hampy" Hampson <[email protected]>

This clarifies that there were _two_ 8gig bugs in Docker history, and the one affecting only Windows containers has not yet had a resolution released. Signed-off-by: Paul "Hampy" Hampson <[email protected]>

TBBle · 2020-09-10T09:25:39Z

TODO: The documentation (specifically the troubleshooting guide) and the cli docs) need to be updated, as they also still express the confused underlying belief that there was a single bug, which was fixed, and then unfixed on Windows.

I assume the source for those docs don't live in this repo, as I didn't find any other references to "37581".

adamrehn · 2020-10-22T08:27:16Z

Apologies for the delay in responding to this pull request, I've been drowning in work over the past few months and I'm slowly chipping away at my GitHub backlog whenever I get the chance.

Back when I first implemented the 8gig diagnostic I deliberately avoided creating a sparse file because I was unsure whether that might end up triggering different behaviour compared to a non-sparse file with random contents. (It actually would've been much easier to create the sparse file, I recall needing to do far more Googling to find PowerShell examples for generating random files.) In lieu of exhaustive testing across various versions of Windows and Docker to ensure behaviour parity, would you consider updating the Dockerfile to accept an ARG named SPARSE that triggers the new behaviour when set to a default value of 1, and a --no-sparse flag in the diagnostic to set the arg to 0 and trigger the existing behaviour? That way the diagnostic will be faster by default, but we retain the option to generate non-sparse files in the event that it's ever necessary.

Yes, the docs live in a separate repo on GitLab these days (since GitLab Pages allows me to use both apex domains and subdomains at the same time, unlike GitHub Pages), so I'll need to update them there to reflect the clarification.

TBBle · 2020-10-23T10:44:05Z

Some time after writing this PR (when iterating on the unit test for this issue in Docker, which this is a back-port of), I discovered it's not technically a sparse file as it's fully allocated and consuming space. NTFS however tracks the last-valid-data location and doesn't actually write the data to disk, as it's all-zero.

So the comment is incorrect wrong, it's not a sparse file.

Since both the Linux and Windows bugs were about the tar stream format itself, the on-disk representation doesn't matter because it's always read-back byte-for-byte, even if there are long strings of zeros. And compression (if used) is applied to the resulting stream, after the format issue.

I don't mind the idea of supporting both all-zeroes file and random-data file with an ARG, but I don't actually have time to come back to this right now and rejig it. I might have time next week, but I have a lot on right now so I can't commit to that.

In the meantime, the fix has merged into moby/moby and should be part of the Docker Engine 20.xx (20.10, perhaps) release, so with a bit of luck when I come back to this, I can also update the doc update to mention the Docker Engine release with the fix.

adamrehn · 2020-10-23T14:18:32Z

Ah okay, that's good to know. In that case I might suggest an ARG named RANDOM set to 1 by a --random flag to trigger the old behaviour in place of the new default behaviour.

Thanks for your work on fixing the upstream issue! It'll be nice to finally put this limitation behind us under Windows.

adamrehn · 2020-11-03T10:15:41Z

I've just updated the docs to clarify that the 8GiB issue can be caused by one or more of a series of separate but related bugs, rather than just a single bug.

adamrehn · 2020-12-11T07:23:15Z

I've incorporated your changes, along with the proposed modifications and a note reflecting the presence of a fix in Docker CE 20.10.0, in commit fddb973 and updated the docs to match.

TBBle · 2020-12-11T12:50:16Z

Awesome. I apologise for never getting back to implementing your requested changes myself. -_-

adamrehn · 2020-12-12T01:20:24Z

All good, I know firsthand how hard it is juggling open source contributions with other job responsibilities. Thanks for your work on this, and even more importantly all your efforts addressing the upstream issues with the Docker/ContainerD/hcsshim stack.

Paul "Hampy" Hampson added 2 commits September 10, 2020 19:05

Add Windows-specific bug to 8gig diagnostic help

087a584

This clarifies that there were _two_ 8gig bugs in Docker history, and the one affecting only Windows containers has not yet had a resolution released. Signed-off-by: Paul "Hampy" Hampson <[email protected]>

TBBle marked this pull request as ready for review September 11, 2020 07:21

TBBle changed the title ~~diagnostics 8gig improvements~~ diagnostics 8gig speed improvement Sep 27, 2020

TBBle changed the title ~~diagnostics 8gig speed improvement~~ diagnostics 8gig speed and doc improvement Sep 27, 2020

TBBle mentioned this pull request Oct 28, 2020

ue4-docker build 4.22.2 fails with 'out of space' with new-enough Docker EE #44

Closed

adamrehn added a commit that referenced this pull request Dec 11, 2020

Incorporate changes from @TBBle adapted from #97

fddb973

adamrehn closed this Dec 11, 2020

TBBle deleted the 8gig-diagnostics-improvements branch December 11, 2020 12:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`diagnostics 8gig` speed and doc improvement #97

`diagnostics 8gig` speed and doc improvement #97

TBBle commented Sep 10, 2020 •

edited

Loading

TBBle commented Sep 10, 2020 •

edited

Loading

adamrehn commented Oct 22, 2020

TBBle commented Oct 23, 2020 •

edited

Loading

adamrehn commented Oct 23, 2020

adamrehn commented Nov 3, 2020

adamrehn commented Dec 11, 2020

TBBle commented Dec 11, 2020

adamrehn commented Dec 12, 2020

diagnostics 8gig speed and doc improvement #97

diagnostics 8gig speed and doc improvement #97

Conversation

TBBle commented Sep 10, 2020 • edited Loading

TBBle commented Sep 10, 2020 • edited Loading

adamrehn commented Oct 22, 2020

TBBle commented Oct 23, 2020 • edited Loading

adamrehn commented Oct 23, 2020

adamrehn commented Nov 3, 2020

adamrehn commented Dec 11, 2020

TBBle commented Dec 11, 2020

adamrehn commented Dec 12, 2020

`diagnostics 8gig` speed and doc improvement #97

`diagnostics 8gig` speed and doc improvement #97

TBBle commented Sep 10, 2020 •

edited

Loading

TBBle commented Sep 10, 2020 •

edited

Loading

TBBle commented Oct 23, 2020 •

edited

Loading