-
Notifications
You must be signed in to change notification settings - Fork 735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Zookeeper persistence #227
Conversation
Fixes #89, "logs" which are actually data would end up outside the mount. Zookeeper's startup logs are more clear than the property file entries: INFO Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /var/lib/zookeeper/log/version-2 snapdir /var/lib/zookeeper/data/version-2
It looks like a change in mount path can not be
It could be a good idea to stop all kafka brokers before doing this. I found no method to stop zk in a way that didn't trigger pod restart. |
@pavel-agarkov Care to test the above? |
Sure! But probably tomorrow since it is already midnight in my timezone. |
The upgrade path would have been smoother if log dir was put inside snapshot dir, but they recommend against that in https://zookeeper.apache.org/doc/r3.4.13/zookeeperAdmin.html#sc_dataFileManagement. It could be that some setups add a separate volume for Maybe the safest way is to back up |
This is still an arbitrary number
Took me a while to make it work on my single node setup. # maxClientCnxns changed from 1 to 2
zookeeper.properties: |-
...
maxClientCnxns=2
... and some other fixes probably also related to the single node setup. But I will know for sure how it works only after a few days of natural nodes killing 😅 EDIT: looks like you have removed this line but it somehow reappeared in my fork after the merge... |
The line is still there. See #230. I should make the change you suggested and release again... but... Did you see any error message that was specific about hitting this limit? That's what I wanted to experience before I started raising the limit. |
Yes, the whole zookeeper's log was filled with:
here is what was before that warning:
|
Do you know which kind of pods that came from, like 10.40.21.8 in your log output? |
I haven't investigated it at that time and I failed to find it in logs now. |
It still works well after a week of pods killing. Topics are not being lost any more. |
Topic information, though not the contents kept in Kafka, would be lost if all zookeeper pods had been down at the same time. Only the snapshots were actually saved to the persistent volume.
According to https://zookeeper.apache.org/doc/r3.4.13/zookeeperAdmin.html#sc_dataFileManagement "ZooKeeper can recover using this snapshot".
The regression probably dates back to ccb9e5d which was released with v2.0.0.