- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks / Alternatives
- Infrastructure Needed (Optional)
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable
- (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- (R) Graduation criteria is in place
- (R) Production readiness review completed
- (R) Production readiness review approved
- "Implementation History" section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website, for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
This proposal aims at extending the current pod specification with support for namespaced kernel parameters (sysctls) set for each pod.
See the abstract and motivation from the original proposal in v1.4.
See the original design proposal's motivation section.
As mentioned in contributors/devel/api_changes.md#alpha-field-in-existing-api-version:
Previously, annotations were used for experimental alpha features, but are no longer recommended for several reasons:
They expose the cluster to "time-bomb" data added as unstructured annotations against an earlier API server (https://issue.k8s.io/30819) They cannot be migrated to first-class fields in the same API version (see the issues with representing a single value in multiple places in backward compatibility gotchas)
The preferred approach adds an alpha field to the existing object, and ensures it is disabled by default:
...
The annotations as a means to set sysctl
are no longer necessary.
The original intent of annotations was to provide additional description of Kubernetes
objects through metadata.
It's time to separate the ability to annotate from the ability to change sysctls settings
so a cluster operator can elevate the distinction between experimental and supported usage
of the feature.
See: original constraints and assumptions
See the original design proposal for alpha.
Setting the sysctl
parameters through annotations provided a successful story
for defining better constraints of running applications.
The sysctl
feature has been tested by a number of people without any serious
complaints. Promoting the annotations to fields (i.e. to beta) is another step in making the
sysctl
feature closer towards the stable API.
Currently, the sysctl
provides security.alpha.kubernetes.io/sysctls
and security.alpha.kubernetes.io/unsafe-sysctls
annotations that can be used
in the following way:
apiVersion: v1
kind: Pod
metadata:
name: sysctl-example
annotations:
security.alpha.kubernetes.io/sysctls: kernel.shm_rmid_forced=1
security.alpha.kubernetes.io/unsafe-sysctls: net.ipv4.route.min_pmtu=1000,kernel.msgmax=1 2 3
spec:
...
The goal is to transition into native fields on pods:
apiVersion: v1
kind: Pod
metadata:
name: sysctl-example
spec:
securityContext:
sysctls:
- name: kernel.shm_rmid_forced
value: 1
- name: net.ipv4.route.min_pmtu
value: 1000
unsafe: true
- name: kernel.msgmax
value: "1 2 3"
unsafe: true
...
The sysctl
design document with more details and rationals is available at design-proposals/node/sysctl.md
-
Introduce native
sysctl
fields in pods throughspec.securityContext.sysctl
field as:sysctl: - name: SYSCTL_PATH_NAME value: SYSCTL_PATH_VALUE unsafe: true # optional field
-
Introduce native
sysctl
fields in PSP as:apiVersion: v1 kind: PodSecurityPolicy metadata: name: psp-example spec: sysctls: - kernel.shmmax - kernel.shmall - net.*
More examples at design-proposals/node/sysctl.md#allowing-only-certain-sysctls
As there is no longer a need to consider the sysctl
feature experimental,
the list of unsafe sysctls can be configured accordingly through:
// KubeletConfiguration contains the configuration for the Kubelet
type KubeletConfiguration struct {
...
// Whitelist of unsafe sysctls or unsafe sysctl patterns (ending in *).
// Default: nil
// +optional
AllowedUnsafeSysctls []string `json:"allowedUnsafeSysctls,omitempty"`
}
Upstream issue: kubernetes/kubernetes#61669
As the sysctl
feature stabilizes, it's time to gate the feature [1] and enable it by default.
- Expected feature gate key:
Sysctls
- Expected default value:
true
With the Sysctl
feature enabled, both sysctl fields in Pod
and PodSecurityPolicy
and the whitelist of unsafed sysctls are acknowledged.
If disabled, the fields and the whitelist are just ignored.
[1] https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/
See also: original sysctl proposal
- As a cluster admin, I want to have
sysctl
feature versioned so I can assure backward compatibility and proper transformation between versioned to internal representation and back.. - As a cluster admin, I want to be confident the
sysctl
feature is stable enough and well supported so applications are properly isolated - As a cluster admin, I want to be able to apply the
sysctl
constraints on the cluster level so I can define the default constraints for all pods.
Extending SecurityContext
struct with Sysctls
field:
// PodSecurityContext holds pod-level security attributes and common container settings.
// Some fields are also present in container.securityContext. Field values of
// container.securityContext take precedence over field values of PodSecurityContext.
type PodSecurityContext struct {
...
// Sysctls is a white list of allowed sysctls in a pod spec.
Sysctls []Sysctl `json:"sysctls,omitempty"`
}
Extending PodSecurityPolicySpec
struct with Sysctls
field:
// PodSecurityPolicySpec defines the policy enforced on sysctls.
type PodSecurityPolicySpec struct {
...
// Sysctls is a white list of allowed sysctls in a pod spec.
Sysctls []Sysctl `json:"sysctls,omitempty"`
}
Following steps in devel/api_changes.md#alpha-field-in-existing-api-version during implementation.
Validation checks implemented as part of #27180.
We need to assure backward compatibility, i.e. object specifications with sysctl
annotations
must still work after the graduation.
All of the above details were copied out of earlier proposals. For graduation, the PRR template below is completed.
- Unit tests and e2es for all applicable changes.
- Any required conformance tests for graduation.
- add sysctl support to pods
- e2e tests
Alpha since 1.4.
- API changes allowing to configure the pod-scoped
sysctl
viaspec.securityContext
field. - API changes allowing to configure the cluster-scoped
sysctl
viaPodSecurityPolicy
object - feature gate enabled by default
Beta since 1.11.
- Promote
--experimental-allowed-unsafe-sysctls
kubelet flag to kubelet config api option - lock feature gate on
There are e2es for sysctl behaviour on upgrades.
N/A
- Feature gate (also fill in values in
kep.yaml
)- Feature gate name: Sysctls
- Components depending on the feature gate: kubelet, apiserver
- Other
- Describe the mechanism:
- Will enabling / disabling the feature require downtime of the control plane?
- Will enabling / disabling the feature require downtime or reprovisioning of a node?
No. Enabling the feature allows the use of sysctls.
Yes, disable the feature flag.
Feature will become available again on the component.
Not currently. Feature has defaulted to on since 1.11; graduation criteria would lock feature to on.
N/A
N/A
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
N/A
No metric currently exists. Feature flag will be set to on and Pod or PSP specifications will include sysctl fields set.
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
N/A, not a service.
N/A, not a service.
Are there any missing metrics that would be useful to have to improve observability of this feature?
N/A
Underlying kernel support for sysctls.
No.
No.
No.
No.
Yes: pods and PSPs have new fields for sysctl values.
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
No.
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
No.
Feature is an API field on pod specification; kubelets behave as usual when API server/etcd are unavailable.
There may be some follow-ups required to improve usability, but I do not believe this should block graduation.
Any scheduling enhancement we make around a node that is configured to allow unsafe sysctls would be a distinct feature.
SLOs do not apply, N/A.
- 2017-06-12: Original design proposal
- 2018-05-14: Update KEP with beta criteria
- 2018-06-06: Promote sysctl annotations to fields
- 2018-06-14: Update sysctls to beta on website
- 2019-07-02: Add allowed sysctl to KubeletConfiguration
- 2021-02-08: Update KEP with final graduation criteria/complete PRR questionnaire
- 2021-02-24: Sysctls graduated to GA
- 2021-03-26: Sysctls added to conformance tests
See also: original design alternatives and considerations
N/A