Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diagnostics 8gig speed and doc improvement #97

Closed
wants to merge 2 commits into from
Closed

diagnostics 8gig speed and doc improvement #97

wants to merge 2 commits into from

Conversation

TBBle
Copy link
Collaborator

@TBBle TBBle commented Sep 10, 2020

This speeds up the test by using a sparse file on Windows (as suggested at moby/moby#41430 (comment)) and updates the in-script documentation to highlight that there were actually two different bugs, rather than being a bugfix which was partially ineffective.

I tested that the 8gig repro continues to take effect on a Windows 10 1909 machine with Docker Daemon 19.03.12, and subtracting 1 from the size causes the failure to not reproduce.

Both of the known 8gig issues are around the tarfile data stream, which isn't affected by the sparseness of the file being archived, as the OCI Layer specification forbids sparse files in the tar stream, so the file is treated naively.

Paul "Hampy" Hampson added 2 commits September 10, 2020 19:05
This is a lot faster (twice as fast in my testing) since file creation
is instantaneous, so it's only the tar-stream process that processes all
8 gigabytes of data.

Suggested by GitHub user @marosset while adding an integration test for
this issue to Moby in moby/moby#41430

Signed-off-by: Paul "Hampy" Hampson <[email protected]>
This clarifies that there were _two_ 8gig bugs in Docker history, and
the one affecting only Windows containers has not yet had a resolution
released.

Signed-off-by: Paul "Hampy" Hampson <[email protected]>
@TBBle
Copy link
Collaborator Author

TBBle commented Sep 10, 2020

TODO: The documentation (specifically the troubleshooting guide) and the cli docs) need to be updated, as they also still express the confused underlying belief that there was a single bug, which was fixed, and then unfixed on Windows.

I assume the source for those docs don't live in this repo, as I didn't find any other references to "37581".

@TBBle TBBle marked this pull request as ready for review September 11, 2020 07:21
@TBBle TBBle changed the title diagnostics 8gig improvements diagnostics 8gig speed improvement Sep 27, 2020
@TBBle TBBle changed the title diagnostics 8gig speed improvement diagnostics 8gig speed and doc improvement Sep 27, 2020
@adamrehn
Copy link
Owner

Apologies for the delay in responding to this pull request, I've been drowning in work over the past few months and I'm slowly chipping away at my GitHub backlog whenever I get the chance.

Back when I first implemented the 8gig diagnostic I deliberately avoided creating a sparse file because I was unsure whether that might end up triggering different behaviour compared to a non-sparse file with random contents. (It actually would've been much easier to create the sparse file, I recall needing to do far more Googling to find PowerShell examples for generating random files.) In lieu of exhaustive testing across various versions of Windows and Docker to ensure behaviour parity, would you consider updating the Dockerfile to accept an ARG named SPARSE that triggers the new behaviour when set to a default value of 1, and a --no-sparse flag in the diagnostic to set the arg to 0 and trigger the existing behaviour? That way the diagnostic will be faster by default, but we retain the option to generate non-sparse files in the event that it's ever necessary.

Yes, the docs live in a separate repo on GitLab these days (since GitLab Pages allows me to use both apex domains and subdomains at the same time, unlike GitHub Pages), so I'll need to update them there to reflect the clarification.

@TBBle
Copy link
Collaborator Author

TBBle commented Oct 23, 2020

Some time after writing this PR (when iterating on the unit test for this issue in Docker, which this is a back-port of), I discovered it's not technically a sparse file as it's fully allocated and consuming space. NTFS however tracks the last-valid-data location and doesn't actually write the data to disk, as it's all-zero.

So the comment is incorrect wrong, it's not a sparse file.

Since both the Linux and Windows bugs were about the tar stream format itself, the on-disk representation doesn't matter because it's always read-back byte-for-byte, even if there are long strings of zeros. And compression (if used) is applied to the resulting stream, after the format issue.

I don't mind the idea of supporting both all-zeroes file and random-data file with an ARG, but I don't actually have time to come back to this right now and rejig it. I might have time next week, but I have a lot on right now so I can't commit to that.

In the meantime, the fix has merged into moby/moby and should be part of the Docker Engine 20.xx (20.10, perhaps) release, so with a bit of luck when I come back to this, I can also update the doc update to mention the Docker Engine release with the fix.

@adamrehn
Copy link
Owner

Ah okay, that's good to know. In that case I might suggest an ARG named RANDOM set to 1 by a --random flag to trigger the old behaviour in place of the new default behaviour.

Thanks for your work on fixing the upstream issue! It'll be nice to finally put this limitation behind us under Windows.

@adamrehn
Copy link
Owner

adamrehn commented Nov 3, 2020

I've just updated the docs to clarify that the 8GiB issue can be caused by one or more of a series of separate but related bugs, rather than just a single bug.

adamrehn added a commit that referenced this pull request Dec 11, 2020
@adamrehn
Copy link
Owner

I've incorporated your changes, along with the proposed modifications and a note reflecting the presence of a fix in Docker CE 20.10.0, in commit fddb973 and updated the docs to match.

@adamrehn adamrehn closed this Dec 11, 2020
@TBBle TBBle deleted the 8gig-diagnostics-improvements branch December 11, 2020 12:49
@TBBle
Copy link
Collaborator Author

TBBle commented Dec 11, 2020

Awesome. I apologise for never getting back to implementing your requested changes myself. -_-

@adamrehn
Copy link
Owner

All good, I know firsthand how hard it is juggling open source contributions with other job responsibilities. Thanks for your work on this, and even more importantly all your efforts addressing the upstream issues with the Docker/ContainerD/hcsshim stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants