-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for splitting metrics-reporter into modules #147
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# Split metrics-reporter into client and server modules | ||
|
||
## Current situation | ||
|
||
The metrics-reporter is currently a single artifact (JAR) that can be used with both Apache Kafka clients and servers. | ||
|
||
## Motivation | ||
|
||
Having a single artifact is simple and works well for the basic functionality. We're now trying to add more advanced features and it's proved hard to keep the same logic for both client and server. | ||
|
||
For example [support for dynamic configurations](https://github.com/strimzi/metrics-reporter/issues/55) is a feature only available for servers. Adding support for this feature is difficult without impacting the client logic. I opened a PoC [PR](https://github.com/strimzi/metrics-reporter/pull/64), and while I got it to work, this caused a lot of complexity due to `KafkaPrometheusMetricsReporter` needing to support both clients and servers. | ||
|
||
Another benefit of having separate modules is the ability to enforce dependencies. Depending on where the reporter runs, a Kafka client or a Kafka server, different dependencies are available at runtime. We need to make sure that logic used by clients does not load any classes only available on servers. We hit this issue in the past ([issue #48](https://github.com/strimzi/metrics-reporter/issues/48)). This is not something that can easily be tested so currently it relies on maintainers being diligent when making code changes to ensure we don't hit this issue again in the future. | ||
|
||
I'm also considering adding support for [KIP-714](https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability) to metrics-reporter. Again this feature is only implemented by server-side metrics reporters. Adding support for this feature will be addressed in a separate proposal. | ||
|
||
## Proposal | ||
|
||
I propose splitting the project into 2 Java modules: | ||
|
||
1. `client-metrics-reporter`: This module will only depend on kafka-clients and the Prometheus libraries. It will be used by Apache Kafka clients (Producer, Admin, Consumer, Connect and Streams) by setting the `metric.reporters` configuration to `io.strimzi.kafka.metrics.ClientMetricsReporter`. All the existing metrics-reporter configurations will stay the same. The differences are: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Happy to change the names if people prefer your suggestions. The original names also included There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we can add There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In that case, I would expect a config parameter to configure the output format with a default value set to Prometheus for backwards compatibility. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we are to add extra reporters in the future, I expect users would pick the format by using different classes. For example if you use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It is not only about the output format. It could be for example push-based model instead of pull-based like Prometheus has. So pretty much all the configuration would be different. Instead of listening on some port, you would connect somewhere to send the metrics there. That might be too complicated to just have it all in the same class. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, makes sense. Didn't know Prom had push-based model. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it is Prometheus itself - it would be more in the OpenTelemetry realm. But I think the push-based metrics are becoming more common. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AFAIK, there is Prometheus Agent which is able to use the push-based model so writing metrics remotely as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like the suggestion of having a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you please clarify how this will work for Connect / MM2? They use a lot of clients. But they also have their own metrics (connectors and tasks and their state, etc.). So it seems like the client reporter is not enough to provide these, or? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Connect runtime and Streams are considered clients. To collect their metrics you would still follow the instructions from https://github.com/strimzi/metrics-reporter?tab=readme-ov-file#kafka-connect-and-kafka-streams but use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So essentially, when used in Connect, the client reporter will automatically handle the additional Connect metrics as well? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, like |
||
- the reporter class name (it used to be `io.strimzi.kafka.metrics.KafkaPrometheusMetricsReporter`) | ||
- the dependency to add to the classpath | ||
|
||
This reporter will use its metric context to validate it runs in a client and will fail at startup if it's not the case. | ||
|
||
2. `server-metrics-reporter`: This module will depend on the client-metrics-reporter module and also on Apache Kafka server JARs required to capture Yammer metrics (as described in [Proposal #64](https://github.com/strimzi/proposals/blob/main/064-prometheus-metrics-reporter.md)). It can be used by Apache Kafka servers (brokers and controllers) by setting the `metric.reporters` configuration to `io.strimzi.kafka.metrics.ServerKafkaMetricsReporter` and the `kafka.metrics.reporters` configuration to `io.strimzi.kafka.metrics.ServerYammerMetricsReporter`. All the existing metrics-reporter configurations will stay the same. The differences are: | ||
scholzj marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- the reporters class names (it used to be `io.strimzi.kafka.metrics.KafkaPrometheusMetricsReporter` and `io.strimzi.kafka.metrics.YammerPrometheusMetricsReporter`) | ||
- the dependencies to add to the classpath | ||
|
||
The project will publish 2 artifacts, one per module. The build will also produce archives including all the dependencies: | ||
- The client-metrics-reporter archive will contain the client-metrics-reporter JAR and all its dependencies | ||
- The server-metrics-reporter archive will contain the server-metrics-reporter JAR and all its dependencies (including the client-metrics-reporter JAR) | ||
## Affected/not affected projects | ||
|
||
In addition of `metrics-reporter`, this will also impact: | ||
|
||
- `strimzi-kafka-operator`: There is a PR ongoing to add support for the metrics-reporter. This will need to be updated to use the new class names and dependency. | ||
- `strimzi-kafka-bridge`: The proposal to add support for the metrics-reporter is not impacted, but the implementation will need to be updated to use the new class names and dependency. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @fvaleri what should we do with strimzi/strimzi-kafka-bridge#954 then? Should we wait for this proposal being accepted, implemented and metrics reporter released? Or @mimaison this is going to take more time so it's bettere to proceed with a first metrics reporter integration within the bridge? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you are close to merge the PR, then you go ahead. |
||
|
||
## Compatibility | ||
|
||
Upgrading from 0.1.0 to a newer release of the metrics-reporter will require manual changes. Users will have to download the right dependencies depending if they are using the reporter with clients or servers and update their client or servers configurations to use the new class names. | ||
|
||
Since the project is still in early access, now is the time if we want to make breaking changes. It will be much harder to do so once the reporter is supported by other projects (`strimzi-kafka-operator`, `strimzi-kafka-bridge`) and starts being used by Strimzi users. | ||
|
||
## Rejected alternatives | ||
|
||
- Keep a single module: It's _possible_ to implement [support for dynamic configurations](https://github.com/strimzi/metrics-reporter/issues/55) and [support for KIP-714](https://github.com/strimzi/metrics-reporter/issues/72) but this causes a lot of complexity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing for me. Shouldn't this be client-side only as the clients send the metrics through this KIP? Or maybe I misunderstood things? This might anyway deserve clarification (or removal as I think you have enough justification even without it 😉).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Kafka client has all the logic to send its metrics built-in. To support KIP-714, brokers must run a metrics reporter implementation that support the feature (by also implementing the
ClientTelemetry
interface).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, in order to support the KIP-714, the server module would in theory read the metrics from the clients and expose them as Prometheus metrics in the Kafka broker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, ok. That makes sense (or it actually does not in the bigger picture, but I get the context for the module split now and understand it :-D).