docs(readme): Adding docs for high prometheus memory usage, queue size and k8s support matrix #2967

rnishtala-sumo · 2023-03-31T20:06:05Z

Adding docs for high prometheus memory usage and otelcol enqueue failures

Checklist

Changelog updated or skip changelog label added
Documentation updated

…nd queue size

sumo-drosiek · 2023-04-03T06:43:18Z

docs/troubleshoot-collection.md

+
+Please consult this when diagnosing issues before diving into Kubernetes directly.
+
+[Zaidan Collection](https://stagdata.long.sumologic.net/ui/#/library/folder/20836981)


This is not accessible

right! I was thinking about support when I added this, but we should probably only add links here that the customer can access.

Yes, it's not an internal documentation

sumo-drosiek · 2023-04-03T06:47:37Z

docs/troubleshoot-collection.md

+When memory usage is higher than ~`95%` of Prometheus container's memory limit, (in the above case it's not: none of the containers exceeded
+`95% * 20 = 19Gi` of used memory) remove WAL from within the container:


I don't think this is proper solution. I would rather say that customer should increase memory requests/limits for Prometheus or reduce number of metrics being scraped by the Prometheus. Removing WAL is more like temporary solution

should we list removing WAL as the last resort, if increasing memory or reducing metrics isn't feasible?

yes, especially because it is temporary solution or clean up after/during metrics spike

sumo-drosiek · 2023-04-03T06:49:29Z

docs/troubleshoot-collection.md

+### Otelcol enqueue failures
+
+Enqueue failures happen when otelcol can't write to its persistent queue. They signify data loss, as otelcol is unable to locally buffer the
+data, and is forced to drop it.


Could you add snippet with example error message here?

docs/troubleshoot-collection.md

perk-sumo · 2023-04-14T13:38:19Z

docs/README.md

@@ -86,7 +86,7 @@ The following table displays the tested Kubernetes and Helm versions.

 | Name          | Version                                  |
 | ------------- | ---------------------------------------- |
-| K8s with EKS  | 1.21<br/>1.22<br/>1.23<br/>1.24          |
+| K8s with EKS  | 1.21<br/>1.22<br/>1.23<br/>1.24<br/>1.25 |


@rnishtala-sumo can we move that one out to a separate commit and/or PR? 🙏

sumo-drosiek · 2024-06-04T12:41:36Z

@rnishtala-sumo Could you recreate this PR for SumoLogic docs repo?

rnishtala-sumo added the skip-changelog label Mar 31, 2023

rnishtala-sumo requested a review from a team as a code owner March 31, 2023 20:06

github-actions bot added the documentation documentation label Mar 31, 2023

github-actions bot approved these changes Mar 31, 2023

View reviewed changes

docs(troubleshooting): Adding docs for high prometheus memory usage a…

2b65886

…nd queue size

rnishtala-sumo force-pushed the troubleshooting-docs branch from 677edd1 to 2b65886 Compare March 31, 2023 20:56

sumo-drosiek reviewed Apr 3, 2023

View reviewed changes

docs/troubleshoot-collection.md Show resolved Hide resolved

rnishtala-sumo changed the title ~~docs(troubleshooting): Adding docs for high prometheus memory usage and queue size~~ docs(readme): Adding docs for high prometheus memory usage, queue size and k8s support matrix Apr 14, 2023

perk-sumo reviewed Apr 14, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(readme): Adding docs for high prometheus memory usage, queue size and k8s support matrix #2967

docs(readme): Adding docs for high prometheus memory usage, queue size and k8s support matrix #2967

rnishtala-sumo commented Mar 31, 2023 •

edited

Loading

sumo-drosiek Apr 3, 2023

rnishtala-sumo Apr 3, 2023

sumo-drosiek Apr 3, 2023 •

edited

Loading

sumo-drosiek Apr 3, 2023

rnishtala-sumo Apr 3, 2023

sumo-drosiek Apr 3, 2023

sumo-drosiek Apr 3, 2023

perk-sumo Apr 14, 2023

sumo-drosiek commented Jun 4, 2024 •

edited

Loading


		Please consult this when diagnosing issues before diving into Kubernetes directly.

		[Zaidan Collection](https://stagdata.long.sumologic.net/ui/#/library/folder/20836981)

		When memory usage is higher than ~`95%` of Prometheus container's memory limit, (in the above case it's not: none of the containers exceeded
		`95% * 20 = 19Gi` of used memory) remove WAL from within the container:

docs(readme): Adding docs for high prometheus memory usage, queue size and k8s support matrix #2967

Are you sure you want to change the base?

docs(readme): Adding docs for high prometheus memory usage, queue size and k8s support matrix #2967

Conversation

rnishtala-sumo commented Mar 31, 2023 • edited Loading

Checklist

sumo-drosiek Apr 3, 2023

Choose a reason for hiding this comment

rnishtala-sumo Apr 3, 2023

Choose a reason for hiding this comment

sumo-drosiek Apr 3, 2023 • edited Loading

Choose a reason for hiding this comment

sumo-drosiek Apr 3, 2023

Choose a reason for hiding this comment

rnishtala-sumo Apr 3, 2023

Choose a reason for hiding this comment

sumo-drosiek Apr 3, 2023

Choose a reason for hiding this comment

sumo-drosiek Apr 3, 2023

Choose a reason for hiding this comment

perk-sumo Apr 14, 2023

Choose a reason for hiding this comment

sumo-drosiek commented Jun 4, 2024 • edited Loading

rnishtala-sumo commented Mar 31, 2023 •

edited

Loading

sumo-drosiek Apr 3, 2023 •

edited

Loading

sumo-drosiek commented Jun 4, 2024 •

edited

Loading