-
-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mysql dont start completely and readiness fail #416
Comments
Hi @cpanato, can you please describe the pod that is not ready (by running The operator sets the pod ready by updating the Hope this helps, |
I have a similar issue with the readiness probe. The workaround has been to delete the mysql-operator pod. Any advise? The mysql operator is on 0.3.8. Thanks |
Similar issue here. If a mysql pod is killed and crash recovery takes too long, then the readiness probe fails for the pod and it is again killed before it can finish crash recovery. The result is a pod that just keeps restarting and will never finish recovery and rejoin the cluster.
|
resolved? |
still happening. |
Because we are unable to configure the readinessProbe we cannot provide cluster stability in the case of a mysql pod that requires crash recovery. As a result we are unable to use this solution in a production environment. We really like the operator model and have been eager to use it in production. If we could give readinessProbe configurations in the mysql podSpec it would get us there, but as-is we can only use this for dev/test environments. |
This is a serious bug, which is not resolved yet. In my case, the mysql container was not ready. After ssh-ing into the mysql container and manually executing |
One side note: We had two instance with more than 10mb of data and one with approx. 1.5mb of data. The one with 1.5mb of data completed the readiness check successfully as the only one. |
Same here and it's easy to reproduce: |
@HendrikRoehm Can |
@haslersn I don't really know as I am no expert on this. In my case this was a small test mysql instance with no replication and thus only 1 pod. I would guess that you should set it on the master instance of the mysql cluster if there are multiple instances. |
We ran into this issue today when running a |
Is there any workaround for this issue?. I manage to make my cluster work updating sys_operator.status manually getting inside mysql pod, and everything goes up, but this is not the right way (if you have to move pod to another node issue will come back again). Does operator manage this value?. |
Can @calind take a look at this? This bug wipes out any serious use of this operator. |
Just encountered this with a cluster whose node was recycled. The mysql pod reports its unready despite showing as healthy. Mysql is not accessible from outside. This killed my plans to use this operator in production. I had to move on. |
Are you sure it has nothing to do with calico? In my case, the problem with this |
@tebaly |
This could be a bug related to this delete table when it assumes the file does not exists while the file actually exists: mysql-operator/pkg/sidecar/appconf.go Lines 201 to 205 in 18a6031
I was able to make the mysql cluster get ready after deleting this file then deleting the pod: |
Looking forward to the fix for this bug to be able to reconsider this operator for use in production. |
It seems like performing a manual failover, i.e. promoting one of the read replicas to be the new master, fixes this issue as well. At least that did the trick in my case. This issue occurred in a freshly set up test cluster where I had shut down (gracefully using ACPI powerdown) all three nodes at the same time on purpose. After starting all three up again, I ended up in that very state, where basically everything worked fine, except kubernetes didn't activate the MySQL service as the master node stuck with the mysql container in state |
My only workaround is:
Not pretty though. |
It is happening to me on Looks like I still don't understand why reconcile was not happening and the only way I can reproduce the issue is when I shutdown my PC and start again. And even with this procedure, just cause the issue from time to time randomly. |
readinessProbe/livenessProbe configs would be most helpful... this is basic..... |
Still I have issue regarding readiness probe failed, I described and found "exec [/bin/sh -c test $(mysql --defaults-file=/etc/mysql/client.conf -NB -e 'SELECT COUNT(*) FROM sys_operator.status WHERE name="configured" AND value="1"') -eq 1] delay=5s timeout=5s period │ |
Notice sometimes when the pod restart for any reason it never comes up again
for example
events:
logs from the pods
any advice in how to recover from that?
/cc @AMecea @jwilander
The text was updated successfully, but these errors were encountered: