Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low-hanging Windows fruit #1314

Merged
merged 6 commits into from
Jan 29, 2020
Merged

Conversation

TBBle
Copy link
Collaborator

@TBBle TBBle commented Jan 5, 2020

A bunch of small changes that fix blocking issue for supporting Buildkit on Windows against containerd.

This doesn't result in working Windows support, but the remaining areas of support are harder-to-solve, relating (so far) to either container spec generation, various aspects of network namespace handling, or various LLB commands that refuse to work if they cannot do user id mapping.

This is to advance #616, but does not complete it.

@TBBle
Copy link
Collaborator Author

TBBle commented Jan 5, 2020

Test results indicate that I can't simply revert 9290c15, as the containerd instance used in testing does not include the necessary shim binary containerd-shim-runc-v2, even though it's part of the containerd source. I'll look into that further.

@TBBle
Copy link
Collaborator Author

TBBle commented Jan 5, 2020

I removed the reversion of 9290c15, until I have time to look into it further. I still think it's the correct choice, but it's a bit annoying that containerd doesn't check for the existence of the shim before using it as the default runtime, when it has non-shim runtimes available.

@AkihiroSuda
Copy link
Member

containerd-shim-runc-v2 should be added here:

&& make bin/containerd-shim \

@TBBle TBBle changed the title Low hanging windows fruit Low hanging Windows fruit Jan 6, 2020
@TBBle TBBle changed the title Low hanging Windows fruit Low-hanging Windows fruit Jan 6, 2020
@TBBle
Copy link
Collaborator Author

TBBle commented Jan 7, 2020

Regarding 9dca2db, I've just retriggered a conversation in containerd/containerd#2366 for this use-case, so it's possible in the future we could revert this change, if they decide they do want to support the target parameter of mount.All.

@@ -131,7 +131,7 @@ func TestIntegration(t *testing.T) {
}

func newContainerd(cdAddress string) (*containerd.Client, error) {
return containerd.New(cdAddress, containerd.WithTimeout(60*time.Second), containerd.WithDefaultRuntime("io.containerd.runtime.v1.linux"))
return containerd.New(cdAddress, containerd.WithTimeout(60*time.Second))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was needed in linux to support older containerd daemons that are not EOL yet.

Copy link
Collaborator Author

@TBBle TBBle Jan 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's odd, because it was only added when containerd 1.3 was vendor'd. Previous containerd has this as the default value already, from my reading, and containerd changed the default to the runtime v2 shim in 1.3.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TBBle: When you get back to this, see if the problem is that the default is being controlled by the vendored part, not the daemon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Containerd client(vendored) unfortunately can't do detection of the supported shims (although the client should always be backward compatible based on docs). So it will default to shim2 and fail on older daemons because they don't have any shim2 binary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this isn't a problem for windows though as none of this works for old containerd in windows anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That of course means it's not low-hanging fruit, so unless I find time to really hammer this out (including working out how to get the unit tests to catch this in future), I'll pull these two changes out of the PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonistiigi : According to #1176, buildkit dropped support for containerd older than 1.3 already. Or am I misreading that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no. It is true #1176 depends on 1.3 but it still works with older daemons while being less efficient on releasing data in some cases. This is different than getting fatal "binary not found" error on every build.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pulled this commit and the previous one (pulling the new containerd Linux V2 runtime into the travis build), I'll come back to this in a new pull request later.

I plan to introduce a command-line option to select containerd runtime, with legacy runc runtime as default on non-Windows (matching containerd 1.2), and hcsshim runtime as default on Windows (matching containerd 1.3).

I'll also try and reinstate the Containerd 1.2 unit tests in the Travis build, so I or someone else doesn't fall into this gap again.

Copy link
Collaborator Author

@TBBle TBBle Jan 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonistiigi Are you sure that the current master works with containerd 1.2? I tried enabling the containerd 1.2 unit tests (#1348), and a fair few of them died trying to call AddResource against containerd, which was added in 1.3.0.

It looks like anything hitting func (s *sourceOp) Exec for the containerimage source will end up in func (cm *cacheManager) GetByBlob, which calls LeaseManager.AddResource and hence fails against containerd 1.2.

It's very possible I'm overlooking something obvious, but I do not currently see any path that can bypass those calls when the API is not present.

Is there just a class of tests/features that are not valid with containerd 1.2?

TBBle added a commit to TBBle/buildkit that referenced this pull request Jan 25, 2020
This reverts commit c4f0305.

containerd 1.2 is still supported, per discussion at
moby#1314 (comment)

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
The same function used to support Unix sockets automatically supports
Named Pipes on Windows.

This makes the default configuration option for the daemon address work
correctly on Windows.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
Wrapping a `nil` error produces `nil`, which causes the calling code to
see success, and continue on with a default-created WorkerOpt, which
causes segfaults later.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
This makes this code successfully discover the Windows Runtime V2
(hcsshim-based) plugin now that the Windows Runtime V1 (runhcs-based)
plugin has been removed upstream.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
The check for running as a non-admin euid() doesn't work on Windows,
always returning -1.

For now, treat -1 as "Probably root", and let the failures happen later.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
Non-packaged execution will need this to be overridden anyway, and it
avoids a surprise "Drop state data into the current working directory"
event.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
@TBBle
Copy link
Collaborator Author

TBBle commented Jan 29, 2020

Compared to the original PR, I've pulled the two changes which turned out to not be low-hanging fruit for later reworking in different PRs.

Need to better-manage the choice of Runtime plugin for the containerd worker

#1314 (comment)

  • containerd 1.2 only supports Runtime V1, containerd 1.3 client libs default to a Runtime V2 runtime.
  • Probably need to move the introspection code earlier, so we choose a Runtime that we know exists, and expose a command-line option to select it.
  • I started drafting this in containerd runtime selection #1348, but haven't yet gotten past turning on containerd 1.2 in the CI script, which failed due to what appears to be containerd 1.3-specific APIs being called by cacheManager: Low-hanging Windows fruit #1314 (comment).

Handle deficiencies in the containerd mount.All and mount.Mount APIs on Windows

#1314 (comment)

  • They don't consume the provided target_dir parameter, pending [carry] Mount Windows layers as volumes on the host containerd/containerd#2366
  • A Windows mount implies its parent layers, while a Linux mount (probably) requires that all the layers be passed to mount.All with type overlay.
  • Initial plan is to split func (lm *localMounter) Mount() into Windows/Unix versions, the way func (lm *localMounter) Unmount() already is, and make the Windows behaviour reject attempts to mount multiple layers onto a single target_dir.
  • There's an API smell here, I think it's in containerd, but we'll need to look at how buildkit uses this to see if mounting multiple layers on Windows is ever going to be attempted, and what the expected semantics would be.

TBBle added a commit to TBBle/buildkit that referenced this pull request Jan 29, 2020
This reverts commit c4f0305.

containerd 1.2 is still supported, per discussion at
moby#1314 (comment)

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
@tonistiigi tonistiigi merged commit 6f4420b into moby:master Jan 29, 2020
@TBBle TBBle deleted the low_hanging_windows_fruit branch January 30, 2020 09:09
TBBle added a commit to TBBle/buildkit that referenced this pull request Mar 1, 2020
This reverts commit c4f0305.

containerd 1.2 is still supported, per discussion at
moby#1314 (comment)

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
TBBle added a commit to TBBle/buildkit that referenced this pull request Mar 1, 2020
This reverts commit c4f0305.

containerd 1.2 is still supported, per discussion at
moby#1314 (comment)

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
TBBle added a commit to TBBle/buildkit that referenced this pull request Mar 1, 2020
This reverts commit c4f0305.

containerd 1.2 is still supported, per discussion at
moby#1314 (comment)

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
TBBle added a commit to TBBle/buildkit that referenced this pull request Mar 1, 2020
Per moby#1314 (comment)
BuildKit should still function with containerd 1.2 daemons that do not
have this API, but less efficiently.

However, a couple of hundred tests fail on CI when they hit this, so
just skip them for now.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
TBBle added a commit to TBBle/buildkit that referenced this pull request Mar 1, 2020
Per moby#1314 (comment)
BuildKit should still function with containerd 1.2 daemons that do not
have this API, but less efficiently.

However, a couple of hundred tests fail on CI when they hit this, so
just skip them for now.

WIP because this doesn't work, see the TODO in Run in run.go.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants