-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
keep state info on kafka #35
Comments
@erayarslan What's your team's opinion regarding this issue? |
in this projects we are following path of couchbase elasticsearch connector which is storing checkpoints on couchbase. but we are aware of workload when checkpoint commit trigger. we will implement solutions to reduce that. |
How about implementing an optional checkpointing mechanism that saves to kafka? |
we already have abstraction on metadata in here |
The other Couchbase connector (kafka-connect-couchbase) keeps state info in Kafka. I started from there but found its performance unsatisfactory. So I reimplemented a CB connector using your libraries. The e2e latencies improved 20x right away to the 100ms level. It would be ideal if you also implemented the kafka metadata option. Otherwise my project couldn't go to production. As said above, CDC is meant to be read-only so not allowed to write into the source database. |
yes we are aware of java connector's behaviour. but another hand elasticsearch connector storing checkpoints on couchbase. |
Thanks for your support. What timeline shall we expect? |
it will be on our planning this week. |
Awesome. Very impressed. Thank you. I will try it out and let you know. |
@erayarslan I have tested the kafka metadata feature. It worked perfectly. Thank you! However, after I restarted the dcp-kafka connector pods (k8s statefulset membeship) on the fly, the connectors were not able to resume from where they stopped, while their cpus stayed at 100% utilization. My checkpoint type was |
@Du-Li thank you for testing kafka metadata. i am so glad it worked for you. or maybe u can share latest logs of connector, so we can diagnose |
The version I tested was already v0.0.35. The logs were not meaningful. All the connectors just stopped there after printing the initialization messages such as dcp stream connected. They kept spinning the cpus 100% without doing anything visible. |
In general, what settings are available for |
only manual and auto options available for checkpoint. i think its not about checkpoint. |
gotcha! When taking the cb metadata option, I noticed that the metadata topic data rate was like 12k RPS, as I noted above. Guess the kafka metadata option has a similar data rate. I just logged in to one of my kafka broker to count the messages and the broker crashed/restarted. It's perhaps too big to process from a single node. I was running 10 dcp-kafka connector pods. When they are restarted, each of them might be reading the entire metadata topic from beginning to end, which is huge. That's perhaps why they all hang there using 100% cpu. If my guess were right, some optimization would be required. Pod restarts are unavoidable in k8s for all kinds of reasons. The system wouldn't be useful if restart makes the pod hang. |
because of this we are using compacted topic for offsets. also debezium, kafka using same system like that. you can tune with these configs:
on the other hand i want to improve our offset consuming logic. |
My settings were like
Does it make sense to you? |
producerBatchTickerDuration means, connector will flush messages every 50ms and commit offsets. project still under development and not stable so i think its documentation issue. so in your case we need to increase these. currently u need to set these by your avarage per message size. maybe
meanwhile, we will make the improvements i mentioned above. |
TIL; in kafka connect couchbase default |
@erayarslan Thanks for your explanations. Actually I tried different values in load testings but they didn't seem to make any difference. |
@Du-Li did you clean the old metadata topic? or give different topic name to create new one? |
I reused the old metadata topic without cleaning. |
Kafka metadata always consume whole metadata topic at the start of application. |
Do we need to set a short retention for this topic? What happens if the
content is auto deleted?
On Wed, Mar 29, 2023 at 11:32 AM Eray ***@***.***> wrote:
Kafka metadata always consume whole topic at the start of application.
We talked about how to keep the topic smaller above.
—
Reply to this email directly, view it on GitHub
<#35 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABJGZ6O6BF3GGFRQ4R7DTCLW6R55RANCNFSM6AAAAAAWC5WOH4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Sent from Gmail Mobile
|
Because of this our metadata topic has we are already working on this. |
@Du-Li with v0.0.36 we descrease |
I have tested the new version. It worked fine although I haven't measured how much difference it makes. Thank you! |
many thanks for your contributions! |
@erayarslan Can you help estimate the size of checkpoint messages in kafka? Suppose there are avg 1000 CB DCP events per second and each event has three copies in the kafka. What's the size of each checkpoint message and how many checkpoint messages are generated per second? |
its about compaction trigger, i cannot estimate perfectly. but we decrease segment.bytes of metadata topic to 2mb in here so there will be nothing to worry about. |
I need to capacity estimates for my kafka cluster. Can you give me a rough idea how much space (msg size and data rate) the checkpoint topic will need? It doesn't have to be too accurate. Thanks. @erayarslan |
i saw compaction trigger when metadata topic size 3-5mb~ in my test cluster. |
for data rate its depend your batch size.
1024 is affected vbucket. i assumed they were all affected. |
Is your feature request related to a problem? Please describe.
Currently the dcp-kafka connector writes the state info (checkpoints) back to couchbase, which consequently magnifies the workload on CB. In my tests, for example, I generated 4k RPS to CB but observed 15k RPS there, which adds almost 3x extra workload to CB. This may not be acceptable in production. Moreover, conceptually, CDC is meant to be non-intrusive to the source databases. Keeping the state info on CB would cause many problems not only in terms of capacity but also breaking that read-only promise or expectation.
Describe the solution you'd like
At least make it an option for the dcp-kafka connector to keep the state info on Kafka instead of CB.
Describe alternatives you've considered
Define an interface so the developer can choose to use Kafka or CB. The two paths will implement the same interface.
Additional context
In my use case, we are definitely not allowed to write to the CB cluster. CDC is meant to be read-only.
The text was updated successfully, but these errors were encountered: