Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changefeedccl: explore whether we can gauge kafka quota usage #92759

Closed
amruss opened this issue Nov 30, 2022 · 4 comments
Closed

changefeedccl: explore whether we can gauge kafka quota usage #92759

amruss opened this issue Nov 30, 2022 · 4 comments
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-cdc

Comments

@amruss
Copy link
Contributor

amruss commented Nov 30, 2022

Related: #92290

We would ideally like to slow down before hitting the quota, we should explore kafka's api and see if we can more intelligently gauge this.

Jira issue: CRDB-21954

Epic CRDB-21691

@amruss amruss added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Nov 30, 2022
@blathers-crl blathers-crl bot added the T-cdc label Nov 30, 2022
@blathers-crl
Copy link

blathers-crl bot commented Nov 30, 2022

cc @cockroachdb/cdc

@wenyihu6
Copy link
Contributor

As part of the kafka quota pushback work, I looked into this, and I'm not seeing an easy way.

If we want to decide whether we should slow down before emitMessages, a potential way we discussed is

  • we decide whether to throttle based on the current outgoing byte rate we measure in sarama and compare it against the target byte rate we fetch from DescribeClientQuota.

I see a few problems with this plan:

  • The outgoing byte rate measured in sarama metrics https://github.com/IBM/sarama/blob/56d2b5c239e1c4d77064b89cc176fd304b3baa0a/sarama.go#L30 takes into account byte rate from all clients. So I don’t think we can use these metrics to decide whether we are exceeding the byte rate limit from DescribeClientQuota. We have to consider using producer side metrics https://github.com/IBM/sarama/blob/56d2b5c239e1c4d77064b89cc176fd304b3baa0a/sarama.go#L54-L67. But these don’t include a byte rate metrics from a specific client. I see record-send-rate but it is records/second. The compression ratio metrics uses compressionRatio := int64(float64(recordBatch.recordsLen) / float64(len(recordBatch.compressedRecords)) * 100) so it doesn’t seem to be something we could use to compute byte rate either.
  • Even if we have byte rates from specific clients, we would need to discuss whether we should use a byte rate per broker or a byte rate for all brokers. We don't know which broker a changefeed could be potentially connecting to. In the worst case, the broker with the highest outgoing byte rate would become the bottleneck since we are not sure which broker would be taking our messages.
  • Different changefeeds across the cluster could send the request simultaneously and the byte rate we measure before sending the request might not match the byte rate right after we send the request.
  • I also don't understand why we would need to fetch DescribeClientQuota periodically. I don't think it changes frequently. It doesn't get dynamically adjusted as there are less quota left for us.

Overall, if these^ are real problems, it makes more sense to apply back pressure if we are observing throttling behavior from kafka using the metrics and adjust if we no longer see throttling behavior. It feels too uncertain if we try to adjust to match the target byte rate.

@wenyihu6
Copy link
Contributor

cc: @rharding6373

@wenyihu6 wenyihu6 removed their assignment Mar 12, 2024
@rharding6373 rharding6373 changed the title changefeedccl: explore whether we can gauge kafka qouta usage changefeedccl: explore whether we can gauge kafka quota usage Mar 21, 2024
@rharding6373
Copy link
Collaborator

This should be complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-cdc
Projects
No open projects
Status: Closed
Development

No branches or pull requests

6 participants