-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WCOW Base layer creation and export #901
Conversation
7b8341c
to
ecb0129
Compare
backuptar.WriteTarFileFromBackupStream
writes tarballs with 0 permission bits
microsoft/go-winio#184
cff971b
to
7a04cff
Compare
2ed80aa
to
269aad0
Compare
269aad0
to
9f2e17d
Compare
Per discussion in #916, the new API |
Another API thought. Since this is currently being added to layer.go where almost all the APIs take I didn't do this initially because the two current exceptions to this API structure are the functions the new API wraps, |
9f2e17d
to
014672b
Compare
014672b
to
57a0abb
Compare
Hey folks, jumping in here to provide an update and clear up some confusion from the Microsoft side. What held this up for so long were some reservations from our legal team on enabling an open-source tool to easily package and redistribute Windows binaries. With the secure supply chain executive order, and our general company policies on security, we are moving very cautiously on topics like these. With that said, I'm here to bring the good news that these issues have resolved, and this PR is good to move forward. I know it's been incredibly frustrating for the community here. I want to emphasize that we are here to support the community and do the best we can to make things easy for you. This definitely took longer than it should have and we're changing how things are done on our end to ensure it doesn't happen again. If you have concerns or questions, always feel welcome to reach out to me. (Brandon Smith (MS) on K8s Slack) @ambarve would you be able to move this forward from here? |
Glad to hear that. I've clearly got some rebasing and updating to do, I'll try and make some time for that soon but I'm pretty snowed under at the moment. Maybe on the coming weekend I can flush the rebasing through, at least. Due to #1375, I can't easily test this with the containerd Snapshotter test-suite as I did in the past. So I'll also have to pull-in and complete the tests I mentioned in April, and ensure everything's working on both LTSC 2019 and LTSC 2022 (although I'm confident it will be, since I squashed one bug and know of only one other likely stumbling point, for which I have a proposed fix). At least those platforms are available on GHA now. |
7028dda
to
23cd100
Compare
Sorry, I haven't had a chance to look closely at #1463 yet. It shouldn't block this (unless an API I need changes...) as I think it's better to do tests in this repo anyway. I've rebased my existing work (it wasn't as bad as I feared, most of the big scary conflict warnings turned out to be delete/edit over test/vendor, and they're trivial to resolve. I've pushed my WIP tests over to https://github.com/TBBle/hcsshim/tree/layer-manipulation-tests and will perhaps go over them tonight or tomorrow, and try and get them clean-enough (and passing-enough, and high-coverage-enough) to pull into this PR, in which case I'd be confident this code is ready-to-go. I see we now have a containerd integration test in the CI suite, but my Go practice is a little rusty, and I'm not sure if it will pick up the checked-out hcsshim in its containerd build as I suspect it will prefer the vendored version. So once #1375 has been resolved in some way, if we |
This is to gain access to: * microsoft/hcsshim#901 TODO: Does not currently pull in the first commit of https://github.com/TBBle/hcsshim/commits/hack-base-layer-manipulation-with-128layers-test as it needs rebasing onto current hcsshim master. That is a proposed fix for a post-RS5 change in behaviour, which needs to be shown necessary on Windows Server 2022 as it was only observed on my Windows 10 20H2 (or earlier) box, well before Windows Server 2022 shipped, and may no longer be needed. Signed-off-by: Paul "TBBle" Hampson <[email protected]>
No worries at all. I found out I'm calling that
The containerd integration suite here should run against/build a shim from whatever is in the current PR. So containerd main + your branch for hcsshim. |
The problem for the integration test is that the code I'm working on isn't in the shim, it's an in-tree snapshotter in containerd linking against hcsshim directly. If it was an external snapshotter instead, that'd be a different question (and also solves the manifesting blocker in-passing). Although there isn't currently support as I recall an external mounter in containerd, so we'd be a significant refactoring away from being able to decouple fully anyway, if that was a useful direction. On option would be to programmatically add a |
@TBBle Ahh gotcha gotcha, you meant the containerd suite. Yea it will still use whatever is in its go.mod for code it uses hcsshim as a library for. External snapshotter solving the manifesting issue is an interesting thought also.. That's actually one of the suggestions we heard internally, but I'm not sure it makes sense for the windows snapshotter to not be built in |
I also had "take the snapshotter external" as an option in the first "hcsshim ABI broke hard" ticket; in a green-field development, knowing what I know now, I'd have built the Windows snapshotter as an external snapshotter in this repo, since it parallels the external runtime shim and better-isolates containerd from vagaries of HCS/computerstorage evolution. (And this is the direction the OS X containerd work was strongly encouraged by the containerd team, which is what put it on my radar at all; in that case, they also farmed off mounting by defining a CLI utility as the API boundary). But given the half-decade-or-so weight of code for the Windows Snapshotter already in containerd, as well as having access to the snapshotter test suite once we get that working, it'd be a time-costly and burden-shifting change that'd need more justification and stakeholder buy-in than I'm currently aware of. |
This is to gain access to: * microsoft/hcsshim#901 TODO: Does not currently pull in the first commit of https://github.com/TBBle/hcsshim/commits/hack-base-layer-manipulation-with-128layers-test as it needs rebasing onto current hcsshim master. That is a proposed fix for a post-RS5 change in behaviour, which needs to be shown necessary on Windows Server 2022 as it was only observed on my Windows 10 20H2 (or earlier) box, well before Windows Server 2022 shipped, and may no longer be needed. Signed-off-by: Paul "TBBle" Hampson <[email protected]>
This is the inverse of the baseLayerWriter: It walks Files/ and UtilityVM/Files/ (if present) and ignores the rest of the layer data, as it will be recreated when the layer is imported. Signed-off-by: Paul "TBBle" Hampson <[email protected]>
This API allows turning any collection of files into a WCOW base layer. It will create the necessary files in Files/ for hcsshim.ProcessBaseLayer to function, validate the necessary files for hcsshim.ProcessUtilityVMImage if UtilityVM/ exists, and then call those two APIs to complete the process. Calling this on a directory containing an untarred base layer OCI tarball, gives a very similar outcome to passing the tar stream through ociwclayer.ImportLayer. The new API is used in `TestSCSIAddRemoveWCOW` to create nearly-empty base layers for the scratch layers attached and removed from the utility VM. A wclayer command is also introduced: `makebaselayer` for testing and validation purposes. Signed-off-by: Paul "TBBle" Hampson <[email protected]>
Signed-off-by: Paul "TBBle" Hampson <[email protected]>
23cd100
to
d376404
Compare
This is to gain access to: * microsoft/hcsshim#901 TODO: Does not currently pull in the first commit of https://github.com/TBBle/hcsshim/commits/hack-base-layer-manipulation-with-128layers-test as it needs rebasing onto current hcsshim master. That is a proposed fix for a post-RS5 change in behaviour, which needs to be shown necessary on Windows Server 2022 as it was only observed on my Windows 10 20H2 (or earlier) box, well before Windows Server 2022 shipped, and may no longer be needed. Signed-off-by: Paul "TBBle" Hampson <[email protected]>
Carrying back results from the now-unblocked testing with my containerd PR that depends on this work, and it fails in what looks like #901 (comment) even on Windows Server 2019. So either something else has gone wrong, or there's been a platform update that pulled in whatever change introduced that in later Windows builds. So for this PR, I'll need to finish my existing test suite work, and get it to the point where I can replicate that issue in this repo directly, and then rebase/reinstate/recreate TBBle@1803a21 into this PR as well, since it looks like it's all-versions necessary now. (A quick test suggests the cherry-pick will be easy, it didn't conflict anywhere except go.mod/go.sum and vendoring) Note to self: The repro case seen in containerd is the import from tar of an exported tarball (i.e. wclayer export -> wclayer import; the latter hits Later edit: I had a few minutes to do the rebase of the "hives" commit (on a new branch, https://github.com/TBBle/hcsshim/tree/base-layer-manipulation-with-hives, to keep things separate) and although the code itself rebased fine, I noticed a couple of issues:
This suggests I should rework the |
This is to gain access to: * microsoft/hcsshim#901 * TBBle/hcsshim@035362c The latter is a WIP fix for a post-RS5 change in behaviour, which needs to be shown necessary on Windows Server 2022 as it was only observed on my Windows 10 20H2 (or earlier) box, well before Windows Server 2022 shipped, and may no longer be needed. On the other hand, it recently appeared in a Windows Server 2019 (RS5) testsuite run, so perhaps it's needed everywhere. That's what we're testing here, as hcsshim does not have a thorough test-suite for these codepaths yet. Signed-off-by: Paul "TBBle" Hampson <[email protected]>
Converted to draft as I'm currently moving countries, and hence don't have time or computing power for at least the next month (and maybe longer... we'll see) to rebase and resolve merge conflicts, let-alone complete the work described in #901 (comment). |
To generate the minimal hive, how about something like this: package main
import (
"fmt"
"log"
"os"
"syscall"
winio "github.com/Microsoft/go-winio"
"golang.org/x/sys/windows/registry"
)
//go:generate go run golang.org/x/sys/windows/mkwinsyscall -output zsyscall_windows.go main.go
//sys RegSaveKeyW(key syscall.Handle, file *uint16, access uint32) (regerrno error) = advapi32.RegSaveKeyW
func main() {
privileges := []string{"SeBackupPrivilege"}
if err := winio.EnableProcessPrivileges(privileges); err != nil {
log.Fatal(err)
}
defer winio.DisableProcessPrivileges(privileges)
k, existing, err := registry.CreateKey(syscall.HKEY_CURRENT_USER, "tempKey", registry.ALL_ACCESS)
if err != nil {
log.Fatal(err)
}
defer registry.DeleteKey(syscall.HKEY_CURRENT_USER, "tempKey")
fmt.Println(k, existing)
saveTo := "C:\\Users\\Administrator\\reg\\save.dat"
if _, err := os.Stat(saveTo); err == nil {
os.Remove(saveTo)
}
if err := RegSaveKeyW(syscall.Handle(k), syscall.StringToUTF16Ptr(saveTo), 0); err != nil {
log.Fatal(err)
}
} This exports an empty key. That exported file should be usable as a valid hive. The |
Use the offline registry library functions instead https://learn.microsoft.com/windows/win32/devnotes/offline-registry-library-functions |
That's a lot better! Thanks for the link! |
Created a PR against your branch to generate empty hives using the offline registry library. I know you don't have time now, so have a look at your earliest convenience. |
#1637, the successor for this PR, has merged. |
Fixes: #853
Implementation of base-layer export and base-layer creation, intended to support
FROM scratch
non-bootable flows like #750, or supporting the containerd snapshotter in producing mountable, exportable layers which are not based on an existing MS-provided OCI image, so that the containerd snapshotter test suite can pass without significant changes.It should work with bootable layers, or at least I've confirmed that passing a nanoserver tarball through these two new features produces something functionally-identical to the original nanoserver tarball.
So in theory, this could be used to "squash" several layers into a new single base layer, but I suspect that's not feasible for anyone except MS due to licensing requirements, and based on my learnings here, MS are clearly using a different tool internally to produce current container images.
Things I don't know, and will ignore unless told-otherwise:
Minimal base-layer export is working.
My test case for this stage is importing
mcr.microsoft.com/windows/nanoserver:10.0.19042.630
's base (and only) layer usingwclayer import
, then exporting it to a tarball and importing that tarball to a new directory.Current differences/issues
No tests
It doesn't appear we have any
wclayer
tests, beyond some usage forwclayer.CreateScratchLayer
on top ofnanoserver
orbusyboxw
, relying on Docker to provide the rest of the stack of images.Certainly nothing testing import/export.
Files that are not round-tripping correctly: Probably working as intended, one open question.
Importing from my generated tarball ends up with different bcd.bak and bcd.log.bak compared to importing from the MS-built tarball. This is because the BCD in the tree is different after untarring, and the bcd.bak is a copy of the bcd from the original tarball.
❔ It's not clear if the exported tarball should invert the backup step, and redirect the BCD and BCD.LOG to use the bcd.bak and bcd.log.bak (etc) files that were saved during import. (Edit: See
mutatedFiles
and its usage. However, I don't think this is relevant to this PR anyway, as we're working at a lower layer than this.)A bunch of other files differ, but they differ every time the same tarball is imported anyway.
And really, since my target here is not bootable images but
FROM scratch
for e.g., #750 or supporting containerd WCOW snapshots sanely, none of the being-changed files matter to me.Details of differing files
Comparing a containerd import to the
wclayer import
from the same (MS-generated tarball), the following files differ:We know that the BCD files are modified by the import process, and the rest are presumably generated on-the-fly and end up with timestamps inside them or something.
Comparing the
wclayer import
of the original tarball to thewclayer import
of the tarball I generated withwclayer export
, the following files differ:And nailing down what all the BCD files are
We can see that the bcd.bak that comes from importing my generated tarball is the BCD from the from-MS imported Utility VM; bcd.bak in the from-MS import is presumably the BCD that's actually inside the tarball.
This is an oddity of
ociwriter.ImportLayer
. It backs up BCD and BCD.LOG*, but doesn't do anything with those backups, they are not referenced anywhere else in the hcsshim code base. I'm not sure if the intent was to copy them back after importing, or if they are supposed to be included in the export (i.e. invert the backup process).File order is different: Only annoying when doing comparisons of the
tar tv
output.I don't know what tool is being used to create the files published by MS, but clearly it's not
filepath.Walk
in Go, as they disagree about the ordering of case. I suspect the most-important thing here is that Files comes before UtilityVM. Order of processing files will affect hard-links in the tarball, but only cosmetically.Hard links: Works for base layers, but not non-base layers (but the latter is unrelated to this PR).
Resolved, for this use-case. The layer export code now recognises hard-links on the filesystem, and adds them to the tar stream. The algorithm roughly matches the one given at the OCI Image Spec "Image Layer Filesystem Changeset" notes on hard-links.
The hard-links present in the nanoserver image are also present if that layer is reexported, but:
In testing, I noticed that hard-links that exist in a non-base layer do not appear as hard-links when exporting, because
modvmcompute.NewProc("ExportLayer")
does not preserve hard links when copying the layer data into a temporary directory. It does not createBACKUP_LINK
streams, and even if it did, go-winio'sbackuptar
doesn't handle them anyway. It's not obvious if theBACKUP_LINK
data format is well-defined but not documented, or is supposed to be application-defined, in which case the content would have to be part of ExportLayer's API and somehow make sense both intra-layer and inter-layer.I also found that
New-Item -ItemType HardLink
didn't work to create cross-layer links (which makes sense, since I assume it wants to change the 'link count' on a file in a read-only layer), so I suspect that at this time, hcsshim will never export hard-links except when exporting a base layer (i.e. this new code). Maybe this used to work, since the code exists to handle both inter-layer and intra-layer hard links in tarballs during import.Hard-link support also pulls in a fairly noisy(This was broken out into separate PRs)Microsoft/go-winio
bump which will resolvably conflict with #941, and since I was runninggo mod tidy
as a matter of course, I think I inlined #942 in passing. If the latter lands first, this'll be less noisy, but the go-winio bump will still generate noise in both modules because it has bumpedgolang.org/x/sys
quite a long way, which flows through into thevendor
directory.Notes from before this was implemented
The tarball from MS for nanoserver's base layer includes lots of hardlinks, some between entries in Files, but most are from the UtilityVM/Files back to Files/. Not capturing these hard links when exporting the layer takes an uncompressed nanoserver base layer from about 280MB to about 430MB.
This one's a little intricate.
winio.NewBackupFileReader
doesn't generatewinio.BackupLink
, that's up to the code walking the tree, and I haven't yet worked out how, from Go, to actually identify that a file is a HardLink, which PowerShell can tell me, and track down the other links, whichfsutil hardlinks
can tell me. So the data's there, but the API has escaped my minimal searching. I am guessing I wantBY_HANDLE_FILE_INFORMATION
to tell me when something has hard-links, and match them by file index.Dagnabbit: Go's
os.File.Stat
actually fetches exactly this structure, but doesn't grab the number of links, and the index values are not exported. So we'll be making syscalls of our own to achieve this.os.SameFile
uses the indexes, but I can't see a good way to apply that here.Also,
github.com/Microsoft/go-winio/backuptar.WriteTarFileFromBackupStream
doesn't handlewinio.BackupLink
anyway. So that would have to be implemented too, or intercept the call tobackuptar.WriteTarFileFromBackupStream
inociwclayer.writeTarFromLayer
like we do for deleted files. That's probably necessary anyway, as per the MS docs, the actual data in thewinio.BackupLink
object is up to the application. It would also mirrorociwclayer.writeLayerFromTar
which similarly intercepts the call towriteBackupStreamFromTarAndSaveMutatedFiles
fortar.TypeLink
headers.I did a quick test of just tarballing-up the tree using Windows-bundled
tar
, and it did find hardlinks correctly, although it disagreed about which one in the tarball was the "original", not that it matters. It also failed to includeUtilityVM
but did includeUtilityVM\Files
, but that could probably be fixed with a better command-line and more than the zero effort I put into this test:No permission bits on the exported tarballs: This was already the case for the non-base layers.
backuptar.WriteTarFileFromBackupStream
just doesn't have any way to set the permission bits on the tar headers it creates. I considered making it automatically apply 0644 for files and 0755 for directories to match the MS-generated tarballs, but (a) that's way over in go-winio, and is separate to this; and (b) I'm only 80% sure that won't invalidate every Windows Containers build-cache on the planet, and I don't think 20% is low-enough to risk having to change my name and go into hiding before an angry mob of CI/CD engineers comes looking for me.Weirdly, the MS-built tarball doesn't have permission bits on the Files, UtilityVM, or UtilityVM/Files directories, which suggests that whatever is done to generate these tarballs, it's doing something fairly specialised and a bit unusual. And also suggests this isn't important.
Punted to microsoft/go-winio#184 for discussion/feedback.
Creating a base layer from some files on-disk is working.
This introduces a new API,
hcsshim.ConvertToBaseLayer
:hcsshim.ProcessBaseLayer
will complete:hcschim.ProcessUtilityVMImage
will complete (assuming the BCD is valid).ProcessBaseLayer
and if necessary,ProcessUtilityVMImage
(w *baseLayerWriter) Close()
afterreapplyDirectoryTimes
.I'm not sure when a UtilityVM is necessary. At the moment, I'm assuming it's only needed if the layer is used as the base of a container's scratch layer, and that's not the target use-case for this work.
As it happens, the test
TestSCSIAddRemoveWCOW
is a passable candidate for exercising this API and particularly the use-case I'm targeting, as it creates scratch layers for attaching to a container as SCSI disks. Before this change, those scratch layers are based on the image used to create the container in the first place, i.e. nanoserver. With this API, it is much more similar toTestSCSIAddRemoveLCOW
which uses trivial empty EXT4 scratch layers.All of the above could be handled by the hcsshim user, e.g., this was done (but never merged) by refactoring the containerd snapshot testsuite in containerd/containerd#2366 to use
hcsshim.ProcessBaseLayer
andfstest.Base
, including definingfstest.Base
in containerd/continuity in order to record an undocumented internal hcsshim requirement.I propose instead to make containerd's snapshot
Commit
API callhcsshim.ConvertToBaseLayer
at the appropriate point so that to callers of the snapshot system, no behaviour difference is seen between Windows and non-Windows behaviours.My fallback is to take all the code I wrote here, and add it to containerd instead, which of course means no one else benefits, and the knowledge needed to call
hcsshim.Process*
remains buried over in some non-hcsshim project.Current differences/issues
Test-case TODO question
testutilities.CreateWCOWBlankRWLayer
has a TODO:It appears that it intends to search the provided layers to find one with a UtilityVM folder, but it's not clear why, as its current use-case is for scratch layers attached as SCSI devices via the Utility VM, which does not, as far as I can see, require a Utility VM.
It's a TODO here because before this change, the top layer has a Utility VM, but with this change, it does not.
The test passes so I think I'm okay, but it's possible there's a downlevel issue, as I'm testing on Windows 10 20H2.
There is no step 3
Well, there is, but not here. Assuming that this work is completed and merged, I need to revendor into containerd, and rebase my work there (see containerd/containerd#4415 (comment)) to take advantage of this new API, and hopefully get the snapshot integration test suite to start passing, which creates arbitrary base layers in every test.