-
Notifications
You must be signed in to change notification settings - Fork 895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Several pods do not start, encounter "too many open files" error #2087
Comments
@jimthompson5802 I've also seen this happening in a KinD cluster I had, for the same Deployments. In my case I mitigated these errors by increasing my laptop's Not sure if this will also work for k3s though. |
@kimwnasptd thank you for the suggestion.
I just want to confirm the parameter names cited are from Linux. If this is correct, then I believe the equivalent parameter in MacOS are these
My belief on the parameter names come for this posting. If this is the case, then the change did not seem to work. What values did you use to get Again, thank you for taking the time to respond to my question. |
I am also facing similar issue on KIND. some of the pods are going to crashloopbackoff state. Error is as below: @kimwnasptd Can you please share what all equivalent setting ( fs.inotify.max_user_{watches,instances} settings) we can do for Mac? |
@jimthompson5802 @skothawa-tibco
Which simply is 10x more than defaults from docker daemon configuration. Restart the docker. After killing all crashing pods they got created successfully. Not sure if related, because I also made other change before rebooting docker. Check if your mysql pod is failing due to |
I am using mac, docker desktop and KIND. @jimthompson5802 tried the above settings but no luck. The above setting are updated on both host machine and on docker daemon configuration. But still issue remains same.
|
@skothawa-tibco |
On the host machine terminal we can see below values:
After exec into worker node getting below error:
Below are the running containers of KIND:
ulimit values inside worker node:
OS details: macOS Monterey 12.1 version @bartgras We are already having the greater values than you suggested. Let me know if any other pointers that can be tried out. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in one week if no further activity occurs. Thank you for your contributions. |
This worked for me too:
(10x previous values) solved this problem on k0s instance |
Thanks for saving my time. I solved my issue with the above comands. |
I've hit this issue today while testing 1.6 on microk8s. The pods affected were: katib-controller, kubeflow-profiles, kfp-api and kfp-persistence. @mstopa 's workaround did fix it, but I'm wondering if we are doing something wrong in these components for this to occur, could we possibly be more efficient in the way we lease API watchers? |
/close There has been no activity for a long time. Please reopen if necessary. |
@juliusvonkohout: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
In setting up a kubeflow cluster using the
master
branch at commit3dad839f
. Four pods encounter too many open files error.For the k8s cluster, I'm using a local
k3d
cluster on MacOS (11.6.1): https://k3d.ioAt end of deploying kubeflow these are the status of 4 pods.
The cluster has been torn down and rebuilt several times. Each time the same 4 pods encounter the too many open files error. All other pods successfully attain
Running
status.According to
ulimit -n
on the nodes, the nodes have a very high setting for that limit: 1048576. Since this is run on MacOS, configuredlaunchctl
to increase the maxfiles from 256 to 524288.I'm new to kubeflow, so any guidance offered will be appreciated.
Following are the diagnostic data collected:
Log extract from failed pods
kubeflow deployed using
kustomize build ${component} | kubectl apply -f -
on each of the following compnonents in the order shown:
Platform
Software Versions:
k3d cluster nodes
ulimit for the two nodes
The text was updated successfully, but these errors were encountered: