-
Notifications
You must be signed in to change notification settings - Fork 735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP integrate linkedin/cruise-control, fixes #100 #218
Conversation
What remaining work do you see? It's a clean opt-in that's confined to the cruise-control. Are there risks with installing cruise-control? |
cruise-control is conservative with making changes to the cluster, so it looks safe. I can't think of remaining work, but I can't say I'm an expert so I marked as WIP. If no one has objections then merge? |
How about we merge to master instead of 4.x, or are we lacking compatibility with Kafka 2.x? |
master sounds good to me. Kafka 2.x is supported with this version of cruise-control. |
I get a crashloop after merge to master:
Could it be because of #221? |
Yes, it's because of #221. If cruise-control supports creating the topic it isn't obvious. May need to do that in the init container. |
in Yolean/kubernetes-kafka#218 Because downloading from an external server at broker pod start is a risk.
Added topic creation. I think we should take a look at the What's a good test to see if Cruise Control is operational? |
Thanks! I was looking into that but hadn't gotten there yet. I accidentally committed version 2.0.17 of cruise control. I hadn't tested that thoroughly yet, it seems it's more than a bug fix release. Can you change that to 2.0.6 ? You can either use a proxy or change the service to a load balancer and hit http://localhost:8090/kafkacruisecontrol/state . |
Regarding the warnings, yes, we should look into those. I noticed those when working on 2.0.17. I always noticed the web server config isn't be recognized. IIRC, 2.0.6 has a much cleaner log. |
I'd like to start with the latest version. Support for 2.x seems to be a WIP so we'll need to follow new releases I think. The warnings don't block merge, we can look at them once up and running. I'll try 2.0.17. For the usage there should be a README in the cruise-control folder. We can start small. What I thought was that it would be great to prove if some essential automation use case is working. Under replicated partitions etc. |
Are you ok with the last three commits in https://github.com/Yolean/kubernetes-kafka/compare/cruise-control-reconfig?expand=1 ? I found a really bad effect of keeping patch files this way, same as with ./metrics/ probably. I wanted to uninstall cruise-conrol so I did Update: pushed f21d15b to address that. Now it's four new commits. |
… into cruise-control
Yes, those commits are better. Especially preventing delete! Would you like me to work on the README? |
but with the changes from the original PR preserved, except maybe self healing
because it might prevent broker restart if the remote server is down Using image from StreamingMicroservicesPlatform/docker-kafka#8
I've re-enabled self healing (da75c0a) and I'm running cruise control now in a test cluster. I think what remains is:
|
Actually it'll be a bit of a pain to keep the two image in sync, the one with the metrics jar and the actual cruise control image. Maybe https://github.com/Yolean/kubernetes-kafka/pull/218/files#diff-36facb0f724a84076b8a2525be1c081aR11 should be the actual cruise control image. Should I add the cruise control build to https://github.com/solsson/dockerfiles? |
I think moving the cruise-control image to https://github.com/solsson/dockerfiles is a good idea for maintenance reasons. (FYI, cruise-control 2.0.18 has been released). There isn't a good reason to keep a separate cruise-control and cruise-control-reporter image, I say pull the report jar from the cruise-control image. As to the I've started a README, let me know which of these things you'd like me to work on so we're not duplicating effort. |
I interpret https://github.com/apache/kafka/blob/2.1/bin/kafka-run-class.sh#L129 as wildcard being supported for CLASSPATH, so I'll go for the generalization of the extensions mechanism. |
Currently blocked by solsson/dockerfiles#20 (comment). Better luck tomorrow maybe. |
and the metrics jar copied from the same image
I'm working on testing replacement of broker nodes. |
I've tested starting with 3 nodes, then scaling up and down. Cruise control re-assigns partitions to balance as expected. When scaling down to 2 it doesn't do anything, but that may be a config item where the min number of nodes is 3. |
I haven't done any real testing, but I'm fairly confident that the addition of the extensions mechanism to core Kafka is low risk. The rest is opt-in. Before merge I'd like us to consider adding a bit more to the readme:
|
I added to the README, what do you think? |
linkedin/cruise-control automates much of the operations of Kafka.
I present this PR for discussion.