Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust node filesystem space filling up warning threshold to 20% #1357

Conversation

arajkumar
Copy link
Contributor

Description

Related to #294

Reduce threshold of NodeFilesystemSpaceFillingUp warning alert to 20% space available, instead of 40% (default).

This will align the threshold according to default kubelet GC values
below[1],

"imageMinimumGCAge": "2m0s",
"imageGCHighThresholdPercent": 85,
"imageGCLowThresholdPercent": 80,

[1] https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/

Signed-off-by: Arunprasad Rajkumar [email protected]

Type of change

What type of changes does your code introduce to the kube-prometheus? Put an x in the box that apply.

  • CHANGE (fix or feature that would cause existing functionality to not work as expected)
  • FEATURE (non-breaking change which adds functionality)
  • BUGFIX (non-breaking change which fixes an issue)
  • ENHANCEMENT (non-breaking change which improves existing functionality)
  • NONE (if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)

Changelog entry

Adjust node filesystem space filling up warning threshold to 20%

Reduce threshold of NodeFilesystemSpaceFillingUp warning alert to 20% space available, instead of 40% (default).

This will align the threshold according to default kubelet GC values
below[1],

"imageMinimumGCAge": "2m0s",
"imageGCHighThresholdPercent": 85,
"imageGCLowThresholdPercent": 80,

[1] https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/

Signed-off-by: Arunprasad Rajkumar <[email protected]>
Signed-off-by: Arunprasad Rajkumar <[email protected]>
Copy link
Contributor

@dgrisonnet dgrisonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good job @arajkumar for finding the actual values used by the garbage collector 👍

@dgrisonnet dgrisonnet merged commit 6f744e2 into prometheus-operator:main Sep 6, 2021
@arajkumar
Copy link
Contributor Author

LGTM, good job @arajkumar for finding the actual values used by the garbage collector 👍

Credit goes to @paulfantom and Simon Reber. I'm just an executor of their idea :)

arajkumar added a commit to arajkumar/kube-prometheus that referenced this pull request Apr 13, 2022
…let GC behavior

Previously[1] we attempted to do the same, but there was a
misunderstanding about the GC behavior and it caused the alert to be
fired even before GC comes into play.

According to[2][3] kubelet GC kicks in only when `imageGCHighThresholdPercent` is hit which is set to 85% by default. However `NodeFilesystemSpaceFillingUp` is set to fire as soon as 80% usage is hit.

This commit changes the `fsSpaceFillingUpWarningThreshold` to 15% so
that we give ample time to GC to reclaim unwanted images. This commit
also changes `fsSpaceFillingUpCriticalThreshold` to 10% which gives more time to admins to react to warning before sending critical alert.

[1] prometheus-operator#1357
[2] https://docs.openshift.com/container-platform/4.10/nodes/nodes/nodes-nodes-garbage-collection.html#nodes-nodes-garbage-collection-images_nodes-nodes-configuring
[3] https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/

Signed-off-by: Arunprasad Rajkumar <[email protected]>
arajkumar added a commit to arajkumar/kube-prometheus that referenced this pull request Apr 27, 2022
…let GC behavior

Previously[1] we attempted to do the same, but there was a
misunderstanding about the GC behavior and it caused the alert to be
fired even before GC comes into play.

According to[2][3] kubelet GC kicks in only when `imageGCHighThresholdPercent` is hit which is set to 85% by default. However `NodeFilesystemSpaceFillingUp` is set to fire as soon as 80% usage is hit.

This commit changes the `fsSpaceFillingUpWarningThreshold` to 15% so
that we give ample time to GC to reclaim unwanted images. This commit
also changes `fsSpaceFillingUpCriticalThreshold` to 10% which gives more time to admins to react to warning before sending critical alert.

[1] prometheus-operator#1357
[2] https://docs.openshift.com/container-platform/4.10/nodes/nodes/nodes-nodes-garbage-collection.html#nodes-nodes-garbage-collection-images_nodes-nodes-configuring
[3] https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/

Signed-off-by: Arunprasad Rajkumar <[email protected]>
(cherry picked from commit 6ff8bfb)
arajkumar added a commit to arajkumar/cluster-monitoring-operator that referenced this pull request May 10, 2022
…let GC behavior

Previously[1] we attempted to do the same, but there was a
misunderstanding about the GC behavior and it caused the alert to be
fired even before GC comes into play.

According to[2][3] kubelet GC kicks in only when imageGCHighThresholdPercent is hit which is set to 85% by default. However NodeFilesystemSpaceFillingUp is set to fire as soon as 80% usage is hit.

This commit changes the fsSpaceFillingUpWarningThreshold to 15% so
that we give ample time to GC to reclaim unwanted images. This commit
also changes fsSpaceFillingUpCriticalThreshold to 10% which gives more time to admins to react to warning before sending critical alert.

[1] prometheus-operator/kube-prometheus#1357
[2] https://docs.openshift.com/container-platform/4.10/nodes/nodes/nodes-nodes-garbage-collection.html#nodes-nodes-garbage-collection-images_nodes-nodes-configuring
[3] https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=2074807

Signed-off-by: Arunprasad Rajkumar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants