-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP 2763: Ambient capabilities in Kubernetes #2757
Conversation
Hi @vinayakankugoyal. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
bfe446f
to
cb42032
Compare
/cc @tallclair |
62ea37c
to
10fa18e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking this up! I'd really like to see this feature.
[documentation style guide]: https://github.com/kubernetes/community/blob/master/contributors/guide/style-guide.md | ||
--> | ||
|
||
This KEP proposes that kubernetes provide a way to set ambient capabilities for containers through the Pod manifest. It also proposes changes that must be made to containerd to enable ambient capabilities end-to-end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume there's a separate process for proposing features for containerd. It probably makes senes to come to an understanding of what we want in k8s first, but I'd like to see the containerd (and ideally cri-o) changes merge before this KEP is marked implementable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there runtimes other than containerd we should care about? How do we usually expand CRI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently this KEP focusses on containerd. I think we should expand this to other runtimes in the future. Coordinating this across multiple runtimes at once would be extremely challenging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I'd like to see the containerd (and ideally cri-o) changes merge before this KEP is marked implementable
but wouldn't we have to first update the CRI API for them to start reading the AddAmbientCapabilities field?
I think that the order of operation should be:
- Update CRI API
- Update containerd and CRI-O
- Update k8s core APIs
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense. Can we just make sure a containerd & cri-o maintainer approves this KEP? I recommend @mrunalp for cri-O, I'm not sure who's active on containerd these days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup we have volunteers for reviewing/approving the KEP from
CRI API - @SergeyKanzhelev
containerd - @mikebrow
cri-O - @mrunalp
Capabilities *Capabilities `json:"capabilities,omitempty" protobuf:"bytes,1,opt,name=capabilities"` | ||
// The ambient capabilities to add when running containers as non-root. | ||
// +optional | ||
AmbientCapabilities []Capability `json:"ambientCapabilities,omitempty" protobuf:"bytes,1,rep,name=ambientCapabilities,casttype=[]Capability"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go with a separate field, I suggest putting it in the Capabilities
struct. E.g. Ambient
works like add, but also makes it ambient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! I agree we should do this as well.
|
||
### Changes to kubernetes API (https://pkg.go.dev/k8s.io/api/core/v1) | ||
|
||
Reusing the existing capabilities field in the securityContext might cause confusion and we propose adding a new field in the SecurityContext called ambientCapabilities. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be open to reusing the add field, as long as the default set is still not included in the ambient set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree. We inherited the API shape from docker here, and it's not ideal. If we are working in the confines of today's shape, it seems plausible that add: CAP_* on a non-root pod actually means the spec author wanted the process to have CAP_*.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC what you are suggesting is that add will add the explicitly added capability to the ambient set and the default ones would still be added like before? What I envisioned was that if you are adding a capability using addAmbient then under the hood we drop all the default capabilities and only add the explicitly added capability to the ambient set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. IIRC, the way Add
works today is it adds it to the bounding
& effective
sets, the same for the default capabilities. What I'm suggesting is that the default capabilities are still only added to bounding & effective (as is proposed), and Add
adds to those sets, but also to Ambient
. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default and explicitly added capabilities are added to inherited, permitted, bounding and effective sets. See this for the code location where this is done.
What in think you are suggesting:
- when someone adds a capability explicitly - that capability should be added to all sets i.e. inherited, permitted, bounding, effective and ambient.
- the default capabilities should only be added to inherited. permitted,bounding and effective sets.
- The advantage of your approach is that:
- we would not require change to kubernetes API, we would still require changes to the CRI API so that we can distinguish which ones are added to the ambient set vs which ones are not.
- The disadvantage of your approach is:
- we are changing the behavior of an existing field. (not saying this is bad, I think this shouldn't break anyone)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think kubernetes/kubernetes#56374 (comment) talks about why we would want to add a new field.
Adding a new field is a simple add and would make it easier to implement the feature. (We wouldn't have to worry about breaking something as the old behavior would remain untouched)
``` | ||
|
||
|
||
Notes about how ambientCapabilities would work: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should forbid adding certain capabilities if the container sets runAsNonRoot
? For example, runAsNonRoot
is probably not adding much protection if you add CAP_SYS_ADMIN
(or even CAP_DAC_OVERRIDE
, in the default set) to the ambient set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was thinking the same! Great call out!
DropCapabilities []string `protobuf:"bytes,2,rep,name=drop_capabilities,json=dropCapabilities,proto3" json:"drop_capabilities,omitempty"` | ||
XXX_NoUnkeyedLiteral struct{} `json:"-"` | ||
XXX_sizecache int32 `json:"-"` | ||
// List of ambient capabilities to add. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should AddAmbientCapabilities
also add them to the inherited & bound sets? I think so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would need them to be added to inheritable and permitted set. See https://lwn.net/Articles/636533/.
TLDR;
AmbientCapabilities obey the invariant that no bit can ever be set in ambient set if it is
not set in both permitted and inheritable.
XXX_NoUnkeyedLiteral struct{} `json:"-"` | ||
XXX_sizecache int32 `json:"-"` | ||
// List of ambient capabilities to add. | ||
AddAmbientCapabilities []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Even if we end up reusing the existing Add
capabilities field in the k8s API, I agree that it should be kept separate at the CRI level.
/cc @mrunalp |
10fa18e
to
effbb42
Compare
effbb42
to
d2457f1
Compare
Drop []Capability `json:"drop,omitempty" protobuf:"bytes,2,rep,name=drop,casttype=Capability"` | ||
// Ambient capabilities to add | ||
// +optional | ||
Ambient []Capability `json:"ambient,omitempty" protobuf:"bytes,2,rep,name=ambient,casttype=Capability"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative could be a boolean for ambient that uses the default capabilties list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool does not work because containerd/docker add some capabilities by default and they will all become ambient for a non-root container, essentially making it root.
5ada203
to
57720d4
Compare
|
||
### Changes to kubernetes API (https://pkg.go.dev/k8s.io/api/core/v1) | ||
|
||
<<[UNRESOLVED should we reuse add instead of adding new field]>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on this note kubernetes/kubernetes#56374 (comment)
I don't think we should reuse add. I think having an additional slice in the Capabilities struct that is add-only is a clean addition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is my two cents about the k8s API design. (Talked with Vinayak offline already)
In one end of the spectrum, k8s API can expose the exact interfaces of the layer below it. In this case, it is Linux capabilities API. It means k8s API can just allow user directly interact with permitted, inherited, effective, bounding and ambient sets. This way gives user 100% control and flexibility as what Linux allows. This also means k8s API need to adapt with Linux capabilities API. For example, a new thing in Linux capabilities results in a new thing in k8s API.
In the other end, k8s API defines and abstracts its own way of interacting with capabilities. So k8s API can be simpler, at the cost of flexibility. And it should hide details of the underlying OS. The original design of securityContext.capabilities.add
and drop
are pretty concise. But it is unclear if it supports both root and non-root. It seems to support but from this k8s doc, it doesn't support non root.
Linux Capabilities: Give a process some privileges, but not all the privileges of the root user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this moment, we need to be backward-compatible, which is the only reason that I agree we need a new field. With the new field, I propose to clarify the spec, based on [kubernetes/kubernetes#56374 (comment)].
add
: if root user, add capabilities; if non-root, behavior is unspecifieddrop
: if root user, drop capabilities; if non-root, behavior is unspecifiedsetForNonRoot
(open to suggestion, I name it based on runAsNonRoot): if non-root user, set capabilities; if root, no ops
My goals are:
- clarify the root vs non-root cases
- clarify how the new field work with existing ones
- hide linux details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some New thoughts after talked with Vinayak today. I prefer (1) > (3) > (2)
Option 1: Reuse add
and drop
- The capabilities will be added to ambient set, besides the current capability sets. Thus it will work for non-root user case without file capability.
- It will not break existing scenarios.
Option 2: Add new field - ambient
- Users need to understand all those Linux capability concept: https://lwn.net/Articles/636533/
- We need to enhance the
add
anddrop
spec by saying permitted, inheritable, bounding and effective set.
Option 3: Add new field - addForNonRoot
add
is for any process(parent or child) with root userdrop
is for any process(parent or child) with root useraddForNonRoot
is for any process(parent or child) with non-root user
/approve |
/label tide/merge-method-squash |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tabbysable, vinayakankugoyal The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
@tabbysable could you please LGTM again (sorry for the inconvenience). There was a typo in the kep metadata that was causing the verify-kep-metadata.sh test to fail. Those are the only changes since last LGTM. Thank you! |
we should probably bump the milestones to 1.24, since this is certainly not going into 1.23, just to avoid confusion |
alright everyone, buckle up: |
/retest |
* KEP 2763: Ambient capabilities in Kubernetes * KEP 2763: Add use cases and polish Signed-off-by: Alexey Perevalov <[email protected]> * update prod-readiness yaml. Change beta to alpha. * Minor formatting fixes. * Delete 2763.yaml because merging provisionally. * Update kep.yaml * Update kep.yaml Co-authored-by: Alexey Perevalov <[email protected]>
* KEP 2763: Ambient capabilities in Kubernetes * KEP 2763: Add use cases and polish Signed-off-by: Alexey Perevalov <[email protected]> * update prod-readiness yaml. Change beta to alpha. * Minor formatting fixes. * Delete 2763.yaml because merging provisionally. * Update kep.yaml * Update kep.yaml Co-authored-by: Alexey Perevalov <[email protected]>
There are 2 options here:- | ||
- **Option 1:** Reuse Add field in [Capabilities](https://pkg.go.dev/k8s.io/api/core/v1#Capabilities) | ||
|
||
When a capability gets added explicitly to a non-root container it also gets added to the ambient set in addition to getting added to inheritable, permitted, bounding and effective sets. The default capabilities are not added to the ambient set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having capabilities being added to the inheritable set turned out to be a CVE affecting multiple runtimes (Moby, containerd, CRI-O, Podman...), more details in the Moby advisory: GHSA-2mm7-x5h6-5pvq
Having the capabilities added directly to the ambient set requires them to be present in the inheritable set as well, thus getting back the old behavior.
I think this option should not be considered.
|
||
#### Restricted ambient capabilities. | ||
|
||
<<[UNRESOLVED what is the set of capabilities that we should allow to be made ambient]>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously when this was discussed PSP's were deprecated and there wasn't alternative. But we now have a PodSecurity admission controller in beta and enabled by default as of 1.23.
So I think if there are any restrictions it should be done in the PodSecurity admission controller. Most should probably be under the 'Privileged' category.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jc2k are you thinking of a change to the pod security standards? These are what you can enforce with Pod Security admission.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends.
For context, let me stress again that when this section of the KEP were first discussed, the admission controller wasn't in beta (if it existed at all) so it wasn't considered a valid solution to this UNRESOLVED
part of the KEP.
In the case where we re-use securityContext.capabilities.add
, the existing standards look like they are valid for root and non-root containers. Neither would be allowed to use CAP_SYS_ADMIN
or CAP_DAC_OVERRIDE
etc unless they were using the privileged
policy level. This is good.
If we are not reusing securityContext.capabilities.add
, then I think the standards should be updated anyway. They currently cover all the ways you can add capabilities to containers. If there is a new way to add a capabitility to a container, it should be covered by the standards.
I'm aware of at least one popular CNI that now has a non-root mode. It does this through suid binaries. I'd much prefer a DaemonSet with an explicit and limited capability allow list to a container image using suid binaries, even if both ultimately have powerful permission sets. |
Hello! thank you for taking the time to review this KEP.
related issue kubernetes/kubernetes#56374
tracking issue in containerd: containerd/containerd#5644
KEP issue: #2763