Grafana Agent Operator - Logs: Too many open files #1844

jallaix · 2022-07-01T08:05:04Z

Not sure how to reproduce the bug.
It happened the first time I tried Grafana Agent Operator, setting a PodLogs configured to retrieve logs for all pods of all namespaces of the K8s cluster.

With PodLogs having a more reduced scope, the bug happens on 1 of my 6 clusters.

Below is the log of the config-reloader container of the grafana-agent-logs pod:

add config file /var/lib/grafana-agent/config-in/agent.yml to watcher: create watcher: too many open files

As a result the grafana-agent container of the grafana-agent-logs pod keeps crashing (CrashLoopBackOff) because of:

error loading config file /var/lib/grafana-agent/config/agent.yml: error reading config file open /var/lib/grafana-agent/config/agent.yml: no such file or directory

Below are my PodLogs:

apiVersion: monitoring.grafana.com/v1alpha1
kind: PodLogs
metadata:
  labels:
    instance: primary
  name: system
  namespace: monitoring
spec:
  namespaceSelector:
    matchNames:
    - kube-system
    - external-secrets
  selector:
    matchLabels: {}
---
apiVersion: monitoring.grafana.com/v1alpha1
kind: PodLogs
metadata:
  labels:
    instance: primary
  name: split
  namespace: monitoring
spec:
  namespaceSelector:
    matchNames:
    - split
    - ingress-nginx
  selector:
    matchLabels: {}

The text was updated successfully, but these errors were encountered:

jallaix · 2022-07-01T08:28:17Z

After 11 hours of CrashloopBackOff, the pod is now running...

Log of the config-reloader container of the grafana-agent-logs pod:

started watching config file and directories for changes" cfg=/var/lib/grafana-agent/config-in/agent.yml out=/var/lib/grafana-agent/config/agent.yml dirs=

tpaschalis · 2022-07-04T13:34:27Z

Hi Julien! 👋 This looks like an error coming from fsnotify.

Where are you running your cluster at (is it a managed service from a Cloud provider, an on-prem installation, or just a local cluster on your laptop)? Could you check the relevant system-imposed limits on files and see if it could be the cause?

jallaix · 2022-07-04T19:16:20Z

Hello ! All my clusters are single-node K3s-based (v1.23.5), running on cloud VMs (AWS, GCP, Scaleway).
My problem occured on a GCP e2-medium instance.

tpaschalis · 2022-07-12T15:03:07Z

Hey, apologies for taking too long to get back to you, the notification got lost in all the noise.

I'm not sure what's the case here, but I'd look into a possible K3S issue and the default Linux parameters between the different cloud providers and distros. For example, could it be similar to the issue reported here?

github-actions · 2022-08-20T00:06:32Z

This issue has been automatically marked as stale because it has not had any activity in the past 30 days.
The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity.
Thank you for your contributions!

marctc · 2022-09-06T08:28:53Z

A possible workaround to fix this problem would be this: https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files

I'll leave the issue open for now as it might be a bug in the operator and needs more investigation.

rfratto · 2022-11-03T14:21:32Z

I'm going to close this as won't fix since this doesn't appear to a code problem, and more of an environment problem (e.g., increase fs.inotify limits).

If you go through that workaround and you're still running into issues, please open a new issue so we can track it; updates in closed issues may get missed.

omidraha · 2024-01-18T23:26:32Z

I have the same issue.

kubectl logs -n fluent-bit loki-a2444106-logs-2rbmq

2024/01/18 23:20:21 error loading config file /var/lib/grafana-agent/config/agent.yml: error reading config file open /var/lib/grafana-agent/config/agent.yml: no such file or directory

Fixed with:

sudo sysctl fs.inotify.max_user_watches=524288
sudo sysctl fs.inotify.max_user_instances=512

rfratto added bug Something isn't working operator Grafana Agent Operator related labels Jul 20, 2022

github-actions bot added the stale Issue/PR mark as stale due lack of activity label Aug 20, 2022

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 27, 2022

marctc reopened this Sep 2, 2022

marctc added keepalive Never close from staleness and removed stale Issue/PR mark as stale due lack of activity labels Sep 2, 2022

rfratto added this to Grafana Agent (Public) Oct 3, 2022

rfratto added the area/operator label Oct 3, 2022

marctc added area/signals and removed keepalive Never close from staleness operator Grafana Agent Operator related area/operator labels Oct 31, 2022

rfratto closed this as not planned Won't fix, can't repro, duplicate, stale Nov 3, 2022

rfratto moved this to Done in Grafana Agent (Public) Nov 3, 2022

github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Feb 21, 2024

github-actions bot locked as resolved and limited conversation to collaborators Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grafana Agent Operator - Logs: Too many open files #1844

Grafana Agent Operator - Logs: Too many open files #1844

jallaix commented Jul 1, 2022

jallaix commented Jul 1, 2022

tpaschalis commented Jul 4, 2022 •

edited

Loading

jallaix commented Jul 4, 2022

tpaschalis commented Jul 12, 2022

github-actions bot commented Aug 20, 2022

marctc commented Sep 6, 2022

rfratto commented Nov 3, 2022

omidraha commented Jan 18, 2024

Grafana Agent Operator - Logs: Too many open files #1844

Grafana Agent Operator - Logs: Too many open files #1844

Comments

jallaix commented Jul 1, 2022

jallaix commented Jul 1, 2022

tpaschalis commented Jul 4, 2022 • edited Loading

jallaix commented Jul 4, 2022

tpaschalis commented Jul 12, 2022

github-actions bot commented Aug 20, 2022

marctc commented Sep 6, 2022

rfratto commented Nov 3, 2022

omidraha commented Jan 18, 2024

tpaschalis commented Jul 4, 2022 •

edited

Loading