Revamp how pipeline supports Limitranges

Up until now, pipeline support for LimitRange is rather limited and confusing and can lead to inconsistency: - It is not applied to `InitContainers` - It zero-out user requests to keep the max and assign to one step, which can lead to invalid `Pod` definition. - It uses only `Min` and never reads `Default` and `DefaultRequest` - It also doesn't support `MaxLimitRequestRatio` This commits aims to fix that by adding more support to LimitRange. Note that, to understand some of the choice, some assumption on how LimitRange works is required. On `Pod` Containers: - *Requests* are not enforced. If the node has more resource available than the request, the container can use it. - *Limits* on the other hand, are a hard stop. A container going over the limit, will be killed. It is thus important to get `Requests` right, to allow scheduling. Requests and limits can come from both Containers and Init Containers. - For init containers, the max of each type is taken - For containers, it sums all requests/limits for each containers This means, if you got the following: - initContainer1 : 1 CPU, 100m memory - initContainer2 : 2 CPU, 200m memory - container1 : 1 CPU, 50m memory - container2 : 2 CPU, 250m memory - container3 : 3 CPU, 500m memory The computation will be: - CPU : 2 (max init containers) + 6 (sum of containers) = 8 CPU - Memory: 200m (max init containers) + 800m (sum of containers) = 1000m (1G) LimitRange enforce (mutates the `Pod`) some Limits and Requests (using `Default` and `DefaultRequest`) and validate those (`Min`, `Max` and `MaxLimitRequestRatio`). They are applied by namespace, and it is possible to have multiple `LimitRange` in a namespace. The way Limits and Requests works in Kubernetes is because it is assumed that all containers run in parallel (which they do — except in tekton with some hack), and init container run before, each one after the others. That assumption — running in parallel — is not really true in Tekton. They do all start together (because there is no way around this) *but* the /entrypoint hack/ is making sure they actually run in sequence and thus there is always only one container that is actually consuming some resource at the same time. This means, we need to handle limits, request and LimitRanges in a /non-standard/ way. Let's try to define that. Tekton needs to take into account all the aspect of the LimitRange : the min/max as well as the default. If there is no default, but there is min/max, Tekton need then to *set* a default value that is between the min/max. If we set the value too low, the Pod won't be able to be created, similar if we set the value too high. *But* those values are set on *containers*, so we *have to* do our own computation to know what request to put on each containers. To add to the complexity here, we also need to support `MaxLimitRequestRatio`, which is just adding complexity on top of something complex. That said, ideally, if we take the default correctly, we should be able to have support for `MaxLimitRequestRatio` for free. This commits tries to add support for this, by computing the minimum request to apply that satisfy the `LimitRange`(s), applying them to `Containers` as well `InitContainers`. Note: If there is multiple `LimitRange` in the namespace, Tekton tries to make the best out of it *but* if they are conflicting with each other (a `Max` on one that is smaller than the `Min` on the other), its the user responsability. Signed-off-by: Vincent Demeester <[email protected]>
tektoncd · Sep 3, 2021 · dafd77c · dafd77c
1 parent f041ef5
commit dafd77c
Show file tree

Hide file tree

Showing 14 changed files with 1,419 additions and 542 deletions.
diff --git a/docs/limitrange.md b/docs/limitrange.md
@@ -0,0 +1,104 @@
+<!--
+---
+linkTitle: "LimitRange"
+weight: 300
+---
+-->
+
+# `LimitRange` support in Pipeline
+
+## `LimitRange`s, `Requests` and `Limits`
+
+Taken from the [LimitRange in kubernetes docs](https://kubernetes.io/docs/concepts/policy/limit-range/).
+
+By default, containers run with unbounded [compute resources](/docs/concepts/configuration/manage-resources-containers/) on a Kubernetes cluster.
+With resource quotas, cluster administrators can restrict resource consumption and creation on a `namespace` basis.
+Within a namespace, a Pod or Container can consume as much CPU and memory as defined by the namespace's resource quota. There is a concern that one Pod or Container could monopolize all available resources. A LimitRange is a policy to constrain resource allocations (to Pods or Containers) in a namespace.
+
+A _LimitRange_ provides constraints that can:
+
+- Enforce minimum and maximum compute resources usage per Pod or Container in a namespace.
+- Enforce minimum and maximum storage request per PersistentVolumeClaim in a namespace.
+- Enforce a ratio between request and limit for a resource in a namespace.
+- Set default request/limit for compute resources in a namespace and automatically inject them to Containers at runtime.
+
+`LimitRange` are validating and mutating `Requests` and `Limits`. Let's look, *in a nutshell*, on how those work in Kubernetes.
+
+- **Requests** are not enforced. If the node has more resource available than the request, the container can use it.
+- **Limits** on the other hand, are a hard stop. A container going over the limit, will be killed.
+
+Resource types for both are:
+-   CPU
+-   Memory
+-   Ephemeral storage
+
+The next question is : how pods with resource and limits are run/scheduled ?
+The scheduler *computes* the amount of CPA and memory requests (using **Requests**) and tries to find a node to schedule it.
+
+Requests and limits can be applied to both Containers and Init Containers.
+- For init containers, the max of each type is taken
+- For containers, it sums all requests/limits for each containers
+
+This means, if you got the following:
+-   initContainer1 : 1 CPU, 100m memory
+-   initContainer2 : 2 CPU, 200m memory
+-   container1 : 1 CPU, 50m memory
+-   container2 : 2 CPU, 250m memory
+-   container3 : 3 CPU, 500m memory
+
+The computation will be:
+-   CPU : 2 (max init containers) + 6 (sum of containers) = 8 CPU
+-   Memory: 200m (max init containers) + 800m (sum of containers) = 1000m (1G)
+
+## Tekton support
+
+The way Limits and Requests works in Kubernetes is because it is assumed that all containers run in parallel, and init container run before, each one after the others.
+
+That assumption — containers running in parallel — is not true in Tekton. They do all start together (because there is no way around this) **but** the *entrypoint hack* is making sure they actually run in sequence and thus there is always only one container that is actually consuming some resource at the same time.
+
+This means, we need to handle limits, request and LimitRanges in a *non-standard* way. Let's try to define that. Tekton needs to take into account all the aspect of the LimitRange : the min/max as well as the default. If there is no default, but there is min/max, Tekton need then to **set** a default value that is between the min/max. If we set the value too low, the Pod won't be able to be created, similar if we set the value too high. **But** those values are set on **containers**, so we **have to** do our own computation to know what request to put on each containers.
+
+
+## A LimitRange is in the namespace
+
+We need to get the default (limits), default requests, min and max values (if they are here).
+One thing to note is that, in the case of a LimitRange being present, we need to **not rely** on the pod mutation webhook that takes the default into account ; what this means is, we need to specify all request and limits ourselves so that the mutation webhook doesn't have any work to do.
+
+- **No default value:** if there is no default value, we need to treat the min as the
+  default. I think that's also what k8s does, at least in our computation.
+- **Default value:** we need to "try" to respect that as much as possible.
+  - `defaultLimit` but no `defaultRequest`, then we set `defaultRequest` to be same as `defaultLimit`.
+  - `defaultRequest` but no `defaultlimit`, then we use the `min` limit as the `defaultLimit`
+  - no `defaultLimit`, no `defaultRequest`, then we use the `min` limit (or request) as
+    `defaultLimit` (and `defaultRequest`).
+
+Now for the container's computations, here are the rules
+- **init containers:** they won't be summed, so the rules are simple
+  - a container needs to have request and limits at least at the `min` and set to the `default` if any.
+  - *use the default requests and the default limits (coming from the defaultLimit, or the min, …)*
+- **containers:** those will be summed at the end, so it gets a bit complex
+  - a container needs to have request and limits at least at the `min`
+  - the sum of the container request/limits **should be** as small as possible. This should be
+    ensured by using the "smallest" possible request on it.
+
+## Multiple LimitRange are in the namespace
+
+Similar to on LimitRange, except we need to act as if it was one LimitRange (virtual) with
+the correct value from each of them.
+
+- Take the maximum of the min values
+- Take the minimum of the max values
+- Take the default request that fits into the previous 2 min/max
+
+Once we have this "virtual" LimitRange, we can act as there was one `LimitRange`. Note that it is possible to define multiple `LimitRange` that would go conflict with each other and block any `Pod` scheduling. Tekton Pipeline will not do anything to try to go around this as it is a behaviour of Kubernetes itself.
+
+# References
+
+- [LimitRange in k8s docs](https://kubernetes.io/docs/concepts/policy/limit-range/)
+- [Configure default memory requests and limits for a Namespace](https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/memory-default-namespace/)
+- [Configure default CPU requests and limits for a Namespace](https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/cpu-default-namespace/)
+- [Configure Minimum and Maximum CPU constraints for a Namespace](https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/cpu-constraint-namespace/)
+- [Configure Minimum and Maximum Memory constraints for a Namespace](https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/memory-constraint-namespace/)
+- [Managing Resources for Containers](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/)
+- [Kubernetes best practices: Resource requests and limits](https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits)
+- [Restrict resource consumption with limit ranges](https://docs.openshift.com/container-platform/4.8/nodes/clusters/nodes-cluster-limit-ranges.html)
diff --git a/docs/pipelineruns.md b/docs/pipelineruns.md
@@ -396,15 +396,11 @@ Consult the documentation of the custom task that you are using to determine whe
 ### Specifying `LimitRange` values
 
 In order to only consume the bare minimum amount of resources needed to execute one `Step` at a
-time from the invoked `Task`, Tekton only requests the *maximum* values for CPU, memory, and ephemeral
-storage from within each `Step`. This is sufficient as `Steps` only execute one at a time in the `Pod`.
-Requests other than the maximum values are set to zero.
+time from the invoked `Task`, Tekton will requests the compute values for CPU, memory, and ephemeral
+storage for each `Step` based on the [`LimitRange`](https://kubernetes.io/docs/concepts/policy/limit-range/)
+object(s), if present. Any `Request` or `Limit` specified by the user (on `Task` for example) will be left unchanged.
 
-When a [`LimitRange`](https://kubernetes.io/docs/concepts/policy/limit-range/) parameter is present in
-the namespace in which `PipelineRuns` are executing and *minimum* values are specified for container resource requests,
-Tekton searches through all `LimitRange` values present in the namespace and uses the *minimums* instead of 0.
-
-For more information, see the [`LimitRange` code example](../examples/v1beta1/pipelineruns/no-ci/limitrange.yaml).
+For more information, see the [`LimitRange` support in Pipeline](./limitrange.md).
 
 ### Configuring a failure timeout
 
@@ -577,7 +573,7 @@ spec:
   status: "PipelineRunCancelled"
 ```
 
-Warning: "PipelineRunCancelled" status is deprecated and would be removed in V1, please use "Cancelled" instead.  
+Warning: "PipelineRunCancelled" status is deprecated and would be removed in V1, please use "Cancelled" instead.
 
 ## Gracefully cancelling a `PipelineRun`
 

diff --git a/docs/taskruns.md b/docs/taskruns.md
@@ -301,15 +301,11 @@ and reasons.
 ### Specifying `LimitRange` values
 
 In order to only consume the bare minimum amount of resources needed to execute one `Step` at a
-time from the invoked `Task`, Tekton only requests the *maximum* values for CPU, memory, and ephemeral
-storage from within each `Step`. This is sufficient as `Steps` only execute one at a time in the `Pod`.
-Requests other than the maximum values are set to zero.
+time from the invoked `Task`, Tekton will requests the compute values for CPU, memory, and ephemeral
+storage for each `Step` based on the [`LimitRange`](https://kubernetes.io/docs/concepts/policy/limit-range/)
+object(s), if present. Any `Request` or `Limit` specified by the user (on `Task` for example) will be left unchanged.
 
-When a [`LimitRange`](https://kubernetes.io/docs/concepts/policy/limit-range/) parameter is present in
-the namespace in which `TaskRuns` are executing and *minimum* values are specified for container resource requests,
-Tekton searches through all `LimitRange` values present in the namespace and uses the *minimums* instead of 0.
-
-For more information, see the [`LimitRange` code example](../examples/v1beta1/taskruns/no-ci/limitrange.yaml).
+For more information, see the [`LimitRange` support in Pipeline](./limitrange.md).
 
 ## Configuring the failure timeout
 
@@ -463,18 +459,18 @@ spec:
 ```
 
 Upon failure of a step, the TaskRun Pod execution is halted. If ths TaskRun Pod continues to run without any lifecycle
-change done by the user (running the debug-continue or debug-fail-continue script) the TaskRun would be subject to 
-[TaskRunTimeout](#configuring-the-failure-timeout). 
+change done by the user (running the debug-continue or debug-fail-continue script) the TaskRun would be subject to
+[TaskRunTimeout](#configuring-the-failure-timeout).
 During this time, the user/client can get remote shell access to the step container with a command such as the following.
 
 ```bash
-kubectl exec -it print-date-d7tj5-pod-w5qrn -c step-print-date-human-readable 
+kubectl exec -it print-date-d7tj5-pod-w5qrn -c step-print-date-human-readable
 ```
 
 #### Debug Environment
 
-After the user/client has access to the container environment, they can scour for any missing parts because of which 
-their step might have failed. 
+After the user/client has access to the container environment, they can scour for any missing parts because of which
+their step might have failed.
 
 To control the lifecycle of the step to mark it as a success or a failure or close the breakpoint, there are scripts
 provided in the `/tekton/debug/scripts` directory in the container. The following are the scripts and the tasks they

diff --git a/internal/builder/v1beta1/pod.go b/internal/builder/v1beta1/pod.go
@@ -123,14 +123,8 @@ func PodContainer(name, image string, ops ...ContainerOp) PodSpecOp {
 		c := &corev1.Container{
 			Name:  name,
 			Image: image,
-			// By default, containers request zero resources. Ops
-			// can override this.
 			Resources: corev1.ResourceRequirements{
-				Requests: corev1.ResourceList{
-					corev1.ResourceCPU:              resource.MustParse("0"),
-					corev1.ResourceMemory:           resource.MustParse("0"),
-					corev1.ResourceEphemeralStorage: resource.MustParse("0"),
-				},
+				Requests: map[corev1.ResourceName]resource.Quantity{},
 			},
 		}
 		for _, op := range ops {
@@ -148,6 +142,9 @@ func PodInitContainer(name, image string, ops ...ContainerOp) PodSpecOp {
 			Name:  name,
 			Image: image,
 			Args:  []string{},
+			Resources: corev1.ResourceRequirements{
+				Requests: map[corev1.ResourceName]resource.Quantity{},
+			},
 		}
 		for _, op := range ops {
 			op(c)

diff --git a/internal/builder/v1beta1/pod_test.go b/internal/builder/v1beta1/pod_test.go
@@ -88,11 +88,7 @@ func TestPod(t *testing.T) {
 				Name:  "nop",
 				Image: "nop:latest",
 				Resources: corev1.ResourceRequirements{
-					Requests: corev1.ResourceList{
-						corev1.ResourceCPU:              resource.MustParse("0"),
-						corev1.ResourceMemory:           resource.MustParse("0"),
-						corev1.ResourceEphemeralStorage: resource.MustParse("0"),
-					},
+					Requests: map[corev1.ResourceName]resource.Quantity{},
 				},
 			}},
 			InitContainers: []corev1.Container{{

diff --git a/pkg/pod/limitrange/doc.go b/pkg/pod/limitrange/doc.go
@@ -0,0 +1,17 @@
+/*
+Copyright 2020 The Tekton Authors
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+*/
+// Package limitrange defines logic for supporting Kubernetes LimitRange for the specific Tekton use cases
+package limitrange
diff --git a/pkg/pod/limitrange/limitrange.go b/pkg/pod/limitrange/limitrange.go
@@ -0,0 +1,115 @@
+/*
+Copyright 2020 The Tekton Authors
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+*/
+package limitrange
+
+import (
+	"context"
+
+	corev1 "k8s.io/api/core/v1"
+	"k8s.io/apimachinery/pkg/api/resource"
+	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
+	"k8s.io/client-go/kubernetes"
+)
+
+func getVirtualLimitRange(ctx context.Context, namespace string, c kubernetes.Interface) (*corev1.LimitRange, error) {
+	limitRanges, err := c.CoreV1().LimitRanges(namespace).List(ctx, metav1.ListOptions{})
+	if err != nil {
+		return nil, err
+	}
+	var limitRange corev1.LimitRange
+	switch {
+	case len(limitRanges.Items) == 0:
+		// No LimitRange defined
+		break
+	case len(limitRanges.Items) == 1:
+		// One LimitRange defined
+		limitRange = limitRanges.Items[0]
+	default:
+		// Several LimitRange defined
+		// Create a virtual LimitRange with
+		// - Maximum of min values
+		// - Minimum of max values
+		// - Default that "fits" into min/max taken above
+		// - Default request that "fits" into min/max taken above
+		// - Smallest ratio (aka the most restrictive one)
+		m := map[corev1.LimitType]corev1.LimitRangeItem{}
+		for _, lr := range limitRanges.Items {
+			for _, item := range lr.Spec.Limits {
+				_, exists := m[item.Type]
+				if !exists {
+					m[item.Type] = corev1.LimitRangeItem{
+						Type:                 item.Type,
+						Min:                  corev1.ResourceList{},
+						Max:                  corev1.ResourceList{},
+						Default:              corev1.ResourceList{},
+						DefaultRequest:       corev1.ResourceList{},
+						MaxLimitRequestRatio: corev1.ResourceList{},
+					}
+				}
+				// Min
+				m[item.Type].Min[corev1.ResourceCPU] = maxOf(m[item.Type].Min[corev1.ResourceCPU], item.Min[corev1.ResourceCPU])
+				m[item.Type].Min[corev1.ResourceMemory] = maxOf(m[item.Type].Min[corev1.ResourceMemory], item.Min[corev1.ResourceMemory])
+				m[item.Type].Min[corev1.ResourceEphemeralStorage] = maxOf(m[item.Type].Min[corev1.ResourceEphemeralStorage], item.Min[corev1.ResourceEphemeralStorage])
+				// Max
+				m[item.Type].Max[corev1.ResourceCPU] = minOf(m[item.Type].Max[corev1.ResourceCPU], item.Max[corev1.ResourceCPU])
+				m[item.Type].Max[corev1.ResourceMemory] = minOf(m[item.Type].Max[corev1.ResourceMemory], item.Max[corev1.ResourceMemory])
+				m[item.Type].Max[corev1.ResourceEphemeralStorage] = minOf(m[item.Type].Max[corev1.ResourceEphemeralStorage], item.Max[corev1.ResourceEphemeralStorage])
+				// MaxLimitRequestRatio
+				m[item.Type].MaxLimitRequestRatio[corev1.ResourceCPU] = minOf(m[item.Type].MaxLimitRequestRatio[corev1.ResourceCPU], item.MaxLimitRequestRatio[corev1.ResourceCPU])
+				m[item.Type].MaxLimitRequestRatio[corev1.ResourceMemory] = minOf(m[item.Type].MaxLimitRequestRatio[corev1.ResourceMemory], item.MaxLimitRequestRatio[corev1.ResourceMemory])
+				m[item.Type].MaxLimitRequestRatio[corev1.ResourceEphemeralStorage] = minOf(m[item.Type].MaxLimitRequestRatio[corev1.ResourceEphemeralStorage], item.MaxLimitRequestRatio[corev1.ResourceEphemeralStorage])
+			}
+		}
+		// Handle Default and DefaultRequest
+		for _, lr := range limitRanges.Items {
+			for _, item := range lr.Spec.Limits {
+				// Default
+				m[item.Type].Default[corev1.ResourceCPU] = minOfBetween(m[item.Type].Default[corev1.ResourceCPU], item.Default[corev1.ResourceCPU], m[item.Type].Min[corev1.ResourceCPU], m[item.Type].Max[corev1.ResourceCPU])
+				m[item.Type].Default[corev1.ResourceMemory] = minOfBetween(m[item.Type].Default[corev1.ResourceMemory], item.Default[corev1.ResourceMemory], m[item.Type].Min[corev1.ResourceMemory], m[item.Type].Max[corev1.ResourceMemory])
+				m[item.Type].Default[corev1.ResourceEphemeralStorage] = minOfBetween(m[item.Type].Default[corev1.ResourceEphemeralStorage], item.Default[corev1.ResourceEphemeralStorage], m[item.Type].Min[corev1.ResourceEphemeralStorage], m[item.Type].Max[corev1.ResourceEphemeralStorage])
+				// DefaultRequest
+				m[item.Type].DefaultRequest[corev1.ResourceCPU] = minOfBetween(m[item.Type].DefaultRequest[corev1.ResourceCPU], item.DefaultRequest[corev1.ResourceCPU], m[item.Type].Min[corev1.ResourceCPU], m[item.Type].Max[corev1.ResourceCPU])
+				m[item.Type].DefaultRequest[corev1.ResourceMemory] = minOfBetween(m[item.Type].DefaultRequest[corev1.ResourceMemory], item.DefaultRequest[corev1.ResourceMemory], m[item.Type].Min[corev1.ResourceMemory], m[item.Type].Max[corev1.ResourceMemory])
+				m[item.Type].DefaultRequest[corev1.ResourceEphemeralStorage] = minOfBetween(m[item.Type].DefaultRequest[corev1.ResourceEphemeralStorage], item.DefaultRequest[corev1.ResourceEphemeralStorage], m[item.Type].Min[corev1.ResourceCPU], m[item.Type].Max[corev1.ResourceCPU])
+			}
+		}
+		for _, v := range m {
+			limitRange.Spec.Limits = append(limitRange.Spec.Limits, v)
+		}
+	}
+	return &limitRange, nil
+}
+
+func maxOf(a, b resource.Quantity) resource.Quantity {
+	if (&a).Cmp(b) > 0 {
+		return a
+	}
+	return b
+}
+
+func minOf(a, b resource.Quantity) resource.Quantity {
+	if isZero(a) || (&a).Cmp(b) > 0 {
+		return b
+	}
+	return a
+}
+
+func minOfBetween(a, b, min, max resource.Quantity) resource.Quantity {
+	if isZero(a) || (&a).Cmp(b) > 0 {
+		return b
+	}
+	return a
+}