feat(processing): Support multiple kafka clusters #1101

untitaker · 2021-10-20T18:21:42Z

This adds additional config to Relay to allow for using dedicated kafka clusters for, in our usecase, the metrics topic. See https://www.notion.so/sentry/Configure-multiple-Kafka-brokers-in-Relay-36fc942b889b429589e87097a02b16e4 for design doc.

Points of improvement (for the future, not sure if relevant right now)

Unused kafka configs are not warned about in any way (but then we also don't warn about unrecognized keys anywhere in the config)
kafka configs are generally very lazily validated. that already included the actual values we pass to rdkafka, but it now also includes the names of the secondary configs. so for example you can reference unknown kafka configs, and it doesn't matter for relay as long as you don't turn on processing mode
the error messages on invalid topic assignments are atrocious because untagged enums in serde only provide generic error messages
you can't do this:
```
topics:
  metrics:
    topic_name: foo
```
i.e. use the "verbose" form of metrics assignment without explicitly specifying both a topic name and a kafka config name. I think that's okay! if you don't want to specify a custom kafka config, simply write metrics: foo like before, if you don't want to specify a custom topic name while explicitly specifying a custom kafka config I guess you're out of luck.

jan-auer · 2021-10-20T19:16:47Z

relay-config/src/config.rs

+        topic_name: String,
+        /// An identifier referencing one of the kafka configurations in `secondary_kafka_configs`.
+        kafka_config_name: String,


These should be named differently as per spec.

Suggested change

topic_name: String,

/// An identifier referencing one of the kafka configurations in `secondary_kafka_configs`.

kafka_config_name: String,

name: String,

/// An identifier referencing one of the kafka configurations in `secondary_kafka_configs`.

broker: String,

I wanted to change this because I found our use of "kafka config" vs "broker" inconsistent. The example in the spec where we write secondary_kafka_config: some_broker: ... only looks good because the user named their kafka config "broker", but it can be any string

Makes sense, I prefer @untitaker's proposal (I vote for changing the spec).

Sounds good, thanks. I'm 👍 with this.

In an attempt to keep it short, a last suggestion on the proposal: name and config. Given that this is in a key called topics, I think name is unambiguous.

jan-auer · 2021-10-20T19:17:27Z

relay-config/src/config.rs

+    /// The `kafka_config` is the default producer configuration used for all topics. A secondary
+    /// kafka config can be referenced in `topics:` like this:
+    ///
+    /// ```yaml


Kudos for the example!

jan-auer · 2021-10-20T19:18:46Z

relay-config/src/config.rs

    /// Kafka topic names.
    #[serde(default)]
-    pub topics: TopicNames,
+    pub topics: TopicMap<TopicAssignment>,


Would add a type alias for TopicMap<TopicAssignment>, and maybe rename the generic type to something more that indicates its generic structure.

do you have a suggestion for rename? I am not sure how to convey that via name

jan-auer · 2021-10-20T19:19:20Z

relay-server/src/actors/outcome.rs

-                for config_p in config.kafka_config() {
+                for config_p in config
+                    .kafka_config(config.kafka_topics().get(KafkaTopic::Outcomes))
+                    .context(ServerErrorKind::KafkaError)?


nit: Please assign this to a variable rather than wrapping the for header.

jan-auer · 2021-10-20T19:22:41Z

relay-server/src/actors/store.rs

+        let mut reused_producers: BTreeMap<_, Arc<_>> = BTreeMap::new();
+
+        let producers = config
+            .kafka_topics()


This elegantly comes with the advantage that you never connect to a broker that's unreferenced from the topic assignments. It's a bit of a stretch that you need to now expose a generic topic map from relay-config just for that, though.

yes, but I already expose that map for the purpose of initializing the store actor and it's "producer map"

Right, that's what I was hinting at -- this is the store actor's initialization function.

It's a bit of a stretch to export such a generic type from the config crate to create a topic to producer map. It's elegant that you then have the producers mapped by topic name, but would overall be less code and probably easier to follow if you just do a lookup by name in here.

@RaduW is raising a similar point in #1101 (comment).

Right, that's what I was hinting at -- this is the store actor's initialization function.

Sorry yes I was thinking that you were commenting on the wrong line.

To be honest I didn't really understand what you meant by "lookup by name", but I removed TopicMap such that the interface between relay-config and relay-server gets smaller (which is what I think you wanted to achieve)

RaduW

Looks good to me.

RaduW · 2021-10-21T07:27:45Z

relay-config/src/config.rs

 #[derive(Serialize, Deserialize, Debug)]
 #[serde(default)]
-pub struct TopicNames {
+pub struct TopicMap<T> {


Nit: I think we are saving very little in terms of avoiding redefining/maintaining 6 fields in two places
while paying for the extra complication of making TopicMap generic.

RaduW · 2021-10-21T07:30:54Z

relay-config/src/config.rs

+        topic_name: String,
+        /// An identifier referencing one of the kafka configurations in `secondary_kafka_configs`.
+        kafka_config_name: String,


Makes sense, I prefer @untitaker's proposal (I vote for changing the spec).

relay-config/src/config.rs

github-actions · 2021-10-27T13:51:58Z

	Fails
🚫	Please consider adding a changelog entry for the next release.

Instructions and example for changelog

For changes exposed to the Python package, please add an entry to py/CHANGELOG.md. This includes, but is not limited to event normalization, PII scrubbing, and the protocol.

For changes to the Relay server, please add an entry to CHANGELOG.md under the following heading:

Features: For new user-visible functionality.
Bug Fixes: For user-visible bug fixes.
Internal: For features and bug fixes in internal operation, especially processing mode.

To the changelog entry, please add a link to this PR (consider a more descriptive message):

- Support multiple kafka clusters. ([#1101](https://github.com/getsentry/relay/pull/1101))

If none of the above apply, you can opt out by adding #skip-changelog to the PR description.

Generated by 🚫 dangerJS against b647eb7

jan-auer · 2021-10-28T10:55:41Z

relay-config/src/config.rs

+        self.values.processing.topics.get(topic).topic_name()
+    }
+
+    /// Topic name and list of Kafka configuration parameters for a given topic.


This does not return the topic name but rather the configuration name.

* master: meta(vscode): Update python extension settings (#1109) ci: Bump sentry-integration python to 3.8 (#1110) feat(processing): Support multiple kafka clusters (#1101)

untitaker added 4 commits October 20, 2021 20:14

wip

ae6856b

fix broken comment

6db0185

add test

4324dc7

formatting

a8f7118

untitaker marked this pull request as ready for review October 20, 2021 19:04

untitaker requested a review from a team October 20, 2021 19:04

untitaker changed the title ~~feat(processing): Support multiple kafka clusters~~ feat(processing): Support multiple kafka clusters [INGEST-421] Oct 20, 2021

add changelog

a827708

jan-auer reviewed Oct 20, 2021

View reviewed changes

apply some of the review feedback

888151e

RaduW approved these changes Oct 21, 2021

View reviewed changes

untitaker added 3 commits October 22, 2021 19:48

refactor to minimize public api of relay-config

bf8f432

rename field

c6fb61f

Merge branch 'master' into feat/multiple-kafka-clusters

5e7d5b7

untitaker requested a review from jan-auer October 22, 2021 17:58

untitaker commented Oct 22, 2021

View reviewed changes

relay-config/src/config.rs Outdated Show resolved Hide resolved

Apply suggestions from code review

b647eb7

untitaker changed the title ~~feat(processing): Support multiple kafka clusters [INGEST-421]~~ feat(processing): Support multiple kafka clusters Oct 27, 2021

untitaker merged commit e96c7bf into master Oct 27, 2021

untitaker deleted the feat/multiple-kafka-clusters branch October 27, 2021 13:50

jan-auer reviewed Oct 28, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(processing): Support multiple kafka clusters #1101

feat(processing): Support multiple kafka clusters #1101

untitaker commented Oct 20, 2021 •

edited

Loading

jan-auer Oct 20, 2021

untitaker Oct 20, 2021 •

edited

Loading

RaduW Oct 21, 2021

jan-auer Oct 21, 2021

untitaker Oct 22, 2021

jan-auer Oct 20, 2021

jan-auer Oct 20, 2021

untitaker Oct 20, 2021

jan-auer Oct 20, 2021

jan-auer Oct 20, 2021

untitaker Oct 20, 2021

jan-auer Oct 21, 2021 •

edited

Loading

untitaker Oct 22, 2021

RaduW left a comment

RaduW Oct 21, 2021

RaduW Oct 21, 2021

github-actions bot commented Oct 27, 2021

jan-auer Oct 28, 2021

feat(processing): Support multiple kafka clusters #1101

feat(processing): Support multiple kafka clusters #1101

Conversation

untitaker commented Oct 20, 2021 • edited Loading

Choose a reason for hiding this comment

untitaker Oct 20, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jan-auer Oct 21, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RaduW left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Oct 27, 2021

Choose a reason for hiding this comment

untitaker commented Oct 20, 2021 •

edited

Loading

untitaker Oct 20, 2021 •

edited

Loading

jan-auer Oct 21, 2021 •

edited

Loading