-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster refuse to start after crash #3
Comments
Oh,I am Sorry,I fixed it by added the line "subPath": "mysql" to Init container just like you said in #2 (comment) |
Ok, good that you found a solution, thank you for sharing! Please note: total cluster crash when all nodes go down, there's a chance that some most recent updates are lost during restart with This may happen:
Read more: https://www.percona.com/blog/2014/09/01/galera-replication-how-to-recover-a-pxc-cluster (See section "Scenario 6" - All nodes went down without proper shutdown procedure) The standard advice for reliable recovery is to not to use |
the problem is still exist: |
AFIAIK, the safest way to migrate your cluster is not to delete all at once but to scale down with some delay to 1 node. Like this the first node will have all changes propagated from other nodes. Then restarting the only node and scaling up again. Loosing all nodes at once should be a very rare disaster in a cluster environment and normally will need some manual restoration of as explained in the above link. Also, normally you do not want to
You can try this to scale down (note, if your cluster is in bad shape, you may loose recent changes on some nodes): kubectl scale mysql --replicas=1 Check that all nodes shut down: kubectl get po -l app=mysql Then wait until all nodes except 1st are properly shut down and the first node starts without errors. Then scale up: kubectl scale mysql --replicas=3 What the log says? What's the source of error causing the "CrashLoopBackOff"? |
kubectl patch statefulset mysql -p '{"spec":{"replicas":1}}' -n seecsea I exec into mysql-0 and sed -i change the config file: How about fix(no if checks) the config file to :safe_to_bootstrap: 1 ? [ "$DEBUG" = "1" ] && set -x GALERA_CONFIG=${GALERA_CONFIG:-"/etc/mysql/conf.d/galera.cnf"} |
You can just If you also set env variable Second way is to manually set Generally, It seems that full recover of Galera cluster deployed as simple K8s StatefulSet (without some sort of arbitrator service) is not an easy exercise. Maybe you should think about switching to master-slave deployment, especially if your environment may experience a full crash. This can be also a K8s StatefulSet where mysql-0 would be always a master. |
sorry for my late reply. |
Hi,I rm some Exited container on k8s node to clean docker ps,so galera pod status changed to Init:0/1,not Running.And then,I delete -f mysql.yaml,all pods cleaned,and create -f mysql.yaml,the pod can not start:
NAME READY STATUS RESTARTS AGE IP NODE
mysql-0 0/1 CrashLoopBackOff 5 4m 172.30.6.17
I do not delete pvc and pv(via ceph RBD StorageClass),and ConfigMap,secrect etc.
the logs:
2017-06-16 14:19:08 140438696024000 [Note] mysqld (mysqld 10.1.24-MariaDB-1~jessie) starting as process 1 ...
2017-06-16 14:19:08 140438696024000 [Note] WSREP: Read nil XID from storage engines, skipping position init
2017-06-16 14:19:08 140438696024000 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2017-06-16 14:19:08 140438696024000 [Note] WSREP: wsrep_load(): Galera 25.3.20(r3703) by Codership Oy [email protected] loaded successfully.
2017-06-16 14:19:08 140438696024000 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
2017-06-16 14:19:08 140438696024000 [Note] WSREP: Found saved state: 4bb73083-4b4a-11e7-a4c7-fbb547b972fa:-1, safe_to_bootsrap: 0
2017-06-16 14:19:08 140438696024000 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = mysql-0.mysql.seecsea.svc.cluster.local; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S
2017-06-16 14:19:08 140438696024000 [Note] WSREP: GCache history reset: old(4bb73083-4b4a-11e7-a4c7-fbb547b972fa:0) -> new(4bb73083-4b4a-11e7-a4c7-fbb547b972fa:-1)
2017-06-16 14:19:08 140438696024000 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2017-06-16 14:19:08 140438696024000 [Note] WSREP: wsrep_sst_grab()
2017-06-16 14:19:08 140438696024000 [Note] WSREP: Start replication
2017-06-16 14:19:08 140438696024000 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2017-06-16 14:19:08 140438696024000 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .
2017-06-16 14:19:08 140438696024000 [ERROR] WSREP: wsrep::connect(gcomm://) failed: 7
2017-06-16 14:19:08 140438696024000 [ERROR] Aborting
but Init container has the ENV such as:
{
"name": "SAFE_TO_BOOTSTRAP",
"value": "1"
},
does it not work correctly?
The text was updated successfully, but these errors were encountered: