Skip to content

Commit

Permalink
Add option to set the GC period to tune how quickly failed Pods are r…
Browse files Browse the repository at this point in the history
…emoved

Also update some of the other GC settings
  • Loading branch information
christian-stephen committed Dec 12, 2023
1 parent b2794b9 commit f43b190
Show file tree
Hide file tree
Showing 6 changed files with 80 additions and 20 deletions.
2 changes: 1 addition & 1 deletion Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@ description: For deploying a CircleCI Container Agent
icon: https://raw.githubusercontent.com/circleci/media/master/logo/build/horizontal_dark.1.png
type: application

version: "101.0.17"
version: "101.0.18"
appVersion: "3"
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

For deploying a CircleCI Container Agent

![Version: 101.0.17](https://img.shields.io/badge/Version-101.0.17-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 3](https://img.shields.io/badge/AppVersion-3-informational?style=flat-square)
![Version: 101.0.18](https://img.shields.io/badge/Version-101.0.18-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 3](https://img.shields.io/badge/AppVersion-3-informational?style=flat-square)

## Contributing

Expand Down Expand Up @@ -56,11 +56,12 @@ The command removes all the Kubernetes objects associated with the chart and del
| agent.constraintChecker.threshold | int | `3` | Number of failed checks before disabling task claim |
| agent.containerSecurityContext | object | `{}` | Security Context policies for agent containers |
| agent.customSecret | string | `""` | Name of the user provided secret containing resource class tokens. You can mix tokens from this secret and in the secret created from tokens specified in the resourceClasses section below Ref: https://circleci.com/docs/container-runner/#custom-secret The tokens should be specified as secret key-value pairs of the form ResourceClass: Token The resource class name needs to match the names configured below exactly to match tokens to the correct configuration As Kubernetes does not allow / in secret keys, a period (.) should be substituted instead |
| agent.environment | object | `{}` | A dictionary of key-value pairs to set as environment variables in the container-agent app container. Note that this does not set environment variables in a task, which can be done via `agent.resourceClasses` or in CircleCI: https://circleci.com/docs/set-environment-variable. |
| agent.environment | object | `{}` | A dictionary of key-value pairs to set as environment variables in the container-agent app container. Note that this does not set environment variables in a task, which can be done via `agent.resourceClasses` or [in CircleCI](https://circleci.com/docs/set-environment-variable). |
| agent.forceUpdate | bool | `false` | Force a rolling update of the agent deployment |
| agent.gc.enabled | bool | `true` | Enable garbage collection (GC) of Kubernetes objects such as Pods or Secrets left over from CircleCI tasks. Dangling objects may occur if container runner is forcefully deleted, causing the task state-tracking to be lost. GC will only remove objects labelled with `app.kubernetes.io/managed-by=circleci-container-agent`. |
| agent.gc.interval | string | `"3m"` | Frequency of GC runs. Adjust this to balance minimal lingering K8s resources vs. system load. Infrequent runs may reduce the load but could result in excess K8s resources, while frequent runs help minimize resources but could increase system load. |
| agent.gc.threshold | string | `"5h5m"` | The age of a Kubernetes object managed by container agent before GC deletes it. This value should be slightly longer than the `agent.maxRunTime` to prevent premature removal. GC may remove some objects sooner than this threshold, such as task Pod containers that fail their liveness probe. |
| agent.image | object | `{"digest":"","pullPolicy":"Always","registry":"","repository":"circleci/runner-agent","tag":"kubernetes-3"}` | Agent image settings. NOTE: Setting an image digest will take precedence over the image tag |
| agent.kubeGCEnabled | bool | `true` | Enable garbage collection of dangling Kubernetes objects managed by container agent |
| agent.kubeGCThreshold | string | `"5h5m"` | The age of a Kubernetes object managed by container agent before the garbage collection deletes it |
| agent.livenessProbe | object | `{"failureThreshold":5,"httpGet":{"path":"/live","port":7623,"scheme":"HTTP"},"initialDelaySeconds":10,"periodSeconds":10,"successThreshold":1,"timeoutSeconds":1}` | Liveness and readiness probe values Ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes |
| agent.matchLabels.app | string | `"container-agent"` | |
| agent.maxConcurrentTasks | int | `20` | Maximum number of tasks that can be run concurrently. IMPORTANT: This concurrency is independent of, and may be limited by, the Runner concurrency of your plan. Configure this value at your own risk based on the resources allocated to your cluster. |
Expand Down
4 changes: 4 additions & 0 deletions changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

This is the Container Agent Helm Chart changelog

# 101.0.18

- [#38](https://github.com/CircleCI-Public/container-runner-helm-chart/pull/38) Add option to set the garbage collection (GC) period to tune how quickly failed Pods are removed.

# 101.0.17

- [#37](https://github.com/CircleCI-Public/container-runner-helm-chart/pull/37) Update the values file and README for the SSH reruns [open preview](https://circleci.com/docs/container-runner-installation/#enable-rerun-job-with-ssh).
Expand Down
27 changes: 17 additions & 10 deletions templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,6 @@ spec:
value: {{ .Values.agent.constraintChecker.interval | quote }}
- name: KUBE_NAMESPACE
value: {{ .Release.Namespace | quote }}
- name: KUBE_GC_ENABLED
value: {{ .Values.agent.kubeGCEnabled | quote }}
- name: KUBE_GC_THRESHOLD
value: {{ .Values.agent.kubeGCThreshold }}
- name: KUBE_TASK_POD_CONFIG
value: /etc/container-agent/taskpods
- name: KUBE_TOKEN_SECRETS
Expand All @@ -95,6 +91,16 @@ spec:
- name: KUBE_AUTODETECT_PLATFORM
value: {{ .Values.agent.autodetectPlatform | quote }}

{{- with .Values.agent }}
# GC configuration settings
- name: KUBE_GC_ENABLED
value: {{- if ne .kubeGCEnabled nil }} {{ .kubeGCEnabled | quote }} {{- else }} {{ .gc.enabled | quote }} {{- end }}
- name: KUBE_GC_THRESHOLD
value: {{- if .kubeGCThreshold }} {{ .kubeGCThreshold | quote }} {{- else }} {{ .gc.threshold | quote }} {{- end }}
- name: KUBE_GC_INTERVAL
value: {{ .gc.interval | quote }}
{{- end }}

{{- if .Values.agent.ssh.enabled }}
{{- $sshName := printf "%s-ssh" (include "container-agent.fullname" .) }}
- name: KUBE_SSH_IS_ENABLED
Expand All @@ -105,11 +111,6 @@ spec:
value: {{ $sshName }}
{{- end }} # if .Values.agent.ssh.enabled

{{- range $key, $value := .Values.agent.environment }}
- name: "{{ $key }}"
value: "{{ $value }}"
{{- end }}

{{- if .Values.proxy.enabled }}
- name: PROXY__SECRETS__HTTP__USERNAME
valueFrom:
Expand All @@ -136,7 +137,13 @@ spec:
key: https-password
optional: true
{{ include "proxy.env" (list .Values.proxy "$(PROXY__SECRETS__HTTP__USERNAME)" "$(PROXY__SECRETS__HTTP__PASSWORD)" "$(PROXY__SECRETS__HTTPS__USERNAME)" "$(PROXY__SECRETS__HTTPS__PASSWORD)" "svc.cluster.local") | indent 12 }}
{{- end }}
{{- end }}

{{- range $key, $value := .Values.agent.environment }}
- name: "{{ $key }}"
value: "{{ $value }}"
{{- end }}

livenessProbe: {{ toYaml .Values.agent.livenessProbe | nindent 12 }}
readinessProbe: {{ toYaml .Values.agent.readinessProbe | nindent 12 }}
{{- if .Values.agent.resources }}
Expand Down
37 changes: 37 additions & 0 deletions tests/deployment_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,43 @@ tests:
name: KUBE_LOGGING_SECRET
value: "my-custom-secret"

- it: should override the default garbage collection settings
set:
agent.gc.enabled: false
agent.gc.threshold: "1h"
agent.gc.interval: "10m"
asserts:
- contains:
path: spec.template.spec.containers[0].env
content:
name: KUBE_GC_ENABLED
value: "false"
- contains:
path: spec.template.spec.containers[0].env
content:
name: KUBE_GC_THRESHOLD
value: "1h"
- contains:
path: spec.template.spec.containers[0].env
content:
name: KUBE_GC_INTERVAL
value: "10m"
- it: should support older garbage collection settings
set:
agent.kubeGCEnabled: false
agent.kubeGCThreshold: "1h"
asserts:
- contains:
path: spec.template.spec.containers[0].env
content:
name: KUBE_GC_ENABLED
value: "false"
- contains:
path: spec.template.spec.containers[0].env
content:
name: KUBE_GC_THRESHOLD
value: "1h"

- it: should set environment variables provided in agent.environment
template: templates/deployment.yaml
set:
Expand Down
21 changes: 16 additions & 5 deletions values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ agent:

# -- A dictionary of key-value pairs to set as environment variables in the container-agent app container.
# Note that this does not set environment variables in a task, which can be done via `agent.resourceClasses` or
# in CircleCI: https://circleci.com/docs/set-environment-variable.
# [in CircleCI](https://circleci.com/docs/set-environment-variable).
environment: {}

# -- Liveness and readiness probe values
Expand Down Expand Up @@ -148,10 +148,21 @@ agent:
# Configure this value at your own risk based on the resources allocated to your cluster.
maxConcurrentTasks: 20

# -- Enable garbage collection of dangling Kubernetes objects managed by container agent
kubeGCEnabled: true
# -- The age of a Kubernetes object managed by container agent before the garbage collection deletes it
kubeGCThreshold: "5h5m"
gc:
# -- Enable garbage collection (GC) of Kubernetes objects such as Pods or Secrets left over from CircleCI tasks.
# Dangling objects may occur if container runner is forcefully deleted, causing the task state-tracking to be lost.
# GC will only remove objects labelled with `app.kubernetes.io/managed-by=circleci-container-agent`.
enabled: true

# -- The age of a Kubernetes object managed by container agent before GC deletes it.
# This value should be slightly longer than the `agent.maxRunTime` to prevent premature removal.
# GC may remove some objects sooner than this threshold, such as task Pod containers that fail their liveness probe.
threshold: "5h5m"

# -- Frequency of GC runs. Adjust this to balance minimal lingering K8s resources vs. system load.
# Infrequent runs may reduce the load but could result in excess K8s resources, while frequent runs help minimize
# resources but could increase system load.
interval: "3m"

# -- Toggle autodetection of OS and CPU architecture to request the appropriate task-agent binary in a heterogeneous cluster.
# If toggled on, this requires container-agent to have certain cluster-wide permissions for nodes.
Expand Down

0 comments on commit f43b190

Please sign in to comment.