Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for splitting metrics-reporter into modules #147

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions 093-split-metrics-reporter-into-modules.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Split metrics-reporter into client and server modules

## Current situation

The metrics-reporter is currently a single artifact (JAR) that can be used with both Apache Kafka clients and servers.

## Motivation

Having a single artifact is simple and works well for the basic functionality. We're now trying to add more advanced features and it's proved hard to keep the same logic for both client and server.

For example [support for dynamic configurations](https://github.com/strimzi/metrics-reporter/issues/55) is a feature only available for servers. Adding support for this feature is difficult without impacting the client logic. I opened a PoC [PR](https://github.com/strimzi/metrics-reporter/pull/64), and while I got it to work, this caused a lot of complexity due to `KafkaPrometheusMetricsReporter` needing to support both clients and servers.

Another benefit of having separate modules is the ability to enforce dependencies. Depending on where the reporter runs, a Kafka client or a Kafka server, different dependencies are available at runtime. We need to make sure that logic used by clients does not load any classes only available on servers. We hit this issue in the past ([issue #48](https://github.com/strimzi/metrics-reporter/issues/48)). This is not something that can easily be tested so currently it relies on maintainers being diligent when making code changes to ensure we don't hit this issue again in the future.

I'm also considering adding support for [KIP-714](https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability) to metrics-reporter. Again this feature is only implemented by server-side metrics reporters. Adding support for this feature will be addressed in a separate proposal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing for me. Shouldn't this be client-side only as the clients send the metrics through this KIP? Or maybe I misunderstood things? This might anyway deserve clarification (or removal as I think you have enough justification even without it 😉).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Kafka client has all the logic to send its metrics built-in. To support KIP-714, brokers must run a metrics reporter implementation that support the feature (by also implementing the ClientTelemetry interface).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in order to support the KIP-714, the server module would in theory read the metrics from the clients and expose them as Prometheus metrics in the Kafka broker?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, ok. That makes sense (or it actually does not in the bigger picture, but I get the context for the module split now and understand it :-D).


## Proposal

I propose splitting the project into 2 Java modules:

1. `client-metrics-reporter`: This module will only depend on kafka-clients and the Prometheus libraries. It will be used by Apache Kafka clients (Producer, Admin, Consumer, Connect and Streams) by setting the `metric.reporters` configuration to `io.strimzi.kafka.metrics.ClientMetricsReporter`. All the existing metrics-reporter configurations will stay the same. The differences are:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not KafkaClientMetricsReporter, KafkaServerMetricsReporter and YammerServerMetricsReporter? It just reads better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to change the names if people prefer your suggestions. The original names also included Prometheus, I decided to drop it to have shorter names, but some people may prefer keeping that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add prometheus to the package name? That would keep E.g. io.strimzi.kafka.metrics.prometheus.ClientMetricsReporter. (I suspect we might need other technologies as well one (maybe not so distant) day :-/)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might need other technologies as well

In that case, I would expect a config parameter to configure the output format with a default value set to Prometheus for backwards compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are to add extra reporters in the future, I expect users would pick the format by using different classes. For example if you use io.strimzi.kafka.metrics.prometheus.ClientMetricsReporter the export format is Prometheus. So I wouldn't add a new configuration to the reporter itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I would expect a config parameter to configure the output format with a default value set to Prometheus for backwards compatibility.

It is not only about the output format. It could be for example push-based model instead of pull-based like Prometheus has. So pretty much all the configuration would be different. Instead of listening on some port, you would connect somewhere to send the metrics there. That might be too complicated to just have it all in the same class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, makes sense. Didn't know Prom had push-based model.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is Prometheus itself - it would be more in the OpenTelemetry realm. But I think the push-based metrics are becoming more common.

Copy link
Member

@ppatierno ppatierno Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, there is Prometheus Agent which is able to use the push-based model so writing metrics remotely as well.
Regarding the naming I am fine with ClientMetricsReporter but maybe within a prometheus package as suggested. It would be clearly Kafka even because the FQDM has kafka in the package as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the suggestion of having a prometheus package. If we have consensus on that point, I'll make the change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please clarify how this will work for Connect / MM2? They use a lot of clients. But they also have their own metrics (connectors and tasks and their state, etc.). So it seems like the client reporter is not enough to provide these, or?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Connect runtime and Streams are considered clients. To collect their metrics you would still follow the instructions from https://github.com/strimzi/metrics-reporter?tab=readme-ov-file#kafka-connect-and-kafka-streams but use io.strimzi.kafka.metrics.ClientMetricsReporter as the class name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So essentially, when used in Connect, the client reporter will automatically handle the additional Connect metrics as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, like io.strimzi.kafka.metrics.KafkaPrometheusMetricsReporter does at the moment.

- the reporter class name (it used to be `io.strimzi.kafka.metrics.KafkaPrometheusMetricsReporter`)
- the dependency to add to the classpath

This reporter will use its metric context to validate it runs in a client and will fail at startup if it's not the case.

2. `server-metrics-reporter`: This module will depend on the client-metrics-reporter module and also on Apache Kafka server JARs required to capture Yammer metrics (as described in [Proposal #64](https://github.com/strimzi/proposals/blob/main/064-prometheus-metrics-reporter.md)). It can be used by Apache Kafka servers (brokers and controllers) by setting the `metric.reporters` configuration to `io.strimzi.kafka.metrics.ServerKafkaMetricsReporter` and the `kafka.metrics.reporters` configuration to `io.strimzi.kafka.metrics.ServerYammerMetricsReporter`. All the existing metrics-reporter configurations will stay the same. The differences are:
scholzj marked this conversation as resolved.
Show resolved Hide resolved
- the reporters class names (it used to be `io.strimzi.kafka.metrics.KafkaPrometheusMetricsReporter` and `io.strimzi.kafka.metrics.YammerPrometheusMetricsReporter`)
- the dependencies to add to the classpath

The project will publish 2 artifacts, one per module. The build will also produce archives including all the dependencies:
- The client-metrics-reporter archive will contain the client-metrics-reporter JAR and all its dependencies
- The server-metrics-reporter archive will contain the server-metrics-reporter JAR and all its dependencies (including the client-metrics-reporter JAR)
## Affected/not affected projects

In addition of `metrics-reporter`, this will also impact:

- `strimzi-kafka-operator`: There is a PR ongoing to add support for the metrics-reporter. This will need to be updated to use the new class names and dependency.
- `strimzi-kafka-bridge`: The proposal to add support for the metrics-reporter is not impacted, but the implementation will need to be updated to use the new class names and dependency.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fvaleri what should we do with strimzi/strimzi-kafka-bridge#954 then? Should we wait for this proposal being accepted, implemented and metrics reporter released? Or @mimaison this is going to take more time so it's bettere to proceed with a first metrics reporter integration within the bridge?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are close to merge the PR, then you go ahead.
This proposal is not accepted yet, then we need to merge the code, and do a release.


## Compatibility

Upgrading from 0.1.0 to a newer release of the metrics-reporter will require manual changes. Users will have to download the right dependencies depending if they are using the reporter with clients or servers and update their client or servers configurations to use the new class names.

Since the project is still in early access, now is the time if we want to make breaking changes. It will be much harder to do so once the reporter is supported by other projects (`strimzi-kafka-operator`, `strimzi-kafka-bridge`) and starts being used by Strimzi users.

## Rejected alternatives

- Keep a single module: It's _possible_ to implement [support for dynamic configurations](https://github.com/strimzi/metrics-reporter/issues/55) and [support for KIP-714](https://github.com/strimzi/metrics-reporter/issues/72) but this causes a lot of complexity.