feat(cardinality): Implement Redis set based cardinality limiter #2745

Dav1dde · 2023-11-20T16:15:37Z

Introduces a new relay-cardinality module, initially it is only housing the cardinality limiter, but at some point there may be more functionality related to cardinality (e.g. smart detection of high cardinality tags).

Implements a very basic cardinality limiter using one Redis set per organization and namespace to track cardinality.

API is still very much work in progress, this works for now but will definitely need to be changed in the future when the requirements for the cardinality limiter change. Outcomes, Tag erasure etc.

Epic: #2717

…miter

Dav1dde · 2023-12-07T10:53:31Z

relay-metrics/src/aggregator.rs

    flush_partitions: Option<u64>,
 ) -> BTreeMap<Option<u64>, Vec<Bucket>> {
    let flush_partitions = match flush_partitions {
-        None => return BTreeMap::from([(None, buckets)]),
+        None => return BTreeMap::from([(None, buckets.into_iter().collect())]),


I checked with godbolt this is optimized away when passing a vector already

jjbayer

I like the look of this! Just a bunch of random comments for now, will do a more in-depth review later.

relay-cardinality/src/limiter.rs

relay-cardinality/Cargo.toml

relay-cardinality/src/redis.rs

relay-cardinality/src/limiter.rs

CHANGELOG.md

relay-cardinality/Cargo.toml

relay-cardinality/src/limiter.rs

relay-cardinality/src/redis.rs

jjbayer · 2023-12-11T12:43:01Z

relay-server/src/actors/processor.rs

+        if !enable_cardinality_limiter {
+            return Ok(Box::new(buckets.into_iter()));
+        }


nit: Why not check the flag outside of this function? Seems more straight-forward than passing a boolean that makes the function behave like a noop (but this might just be my personal preference).

I implemented the function pretty much only for this check, so I can early return, otherwise it becomes very complex with the cfg(feature = "processing")

jjbayer · 2023-12-11T12:43:40Z

relay-server/src/actors/processor.rs

+    fn check_cardinality_limits(
+        &self,
+        enable_cardinality_limiter: bool,
+        _organization_id: u64,


nit: Should make this function #[cfg(feature = "processing")]?

Suggested change

_organization_id: u64,

organization_id: u64,

Unfortunately this is needed for non-processing, the entire function just exists to handle the if case with an early return to make the code somewhat readable and not a pure if else cfg mess.

Is it still a mess with the if_processing! macro?

relay-cardinality/src/redis.rs

jjbayer

Let's ship it! Before opting in all orgs we should transition from copying entire redis sets to a Lua script, as you suggested.

jjbayer · 2023-12-11T16:35:35Z

relay-cardinality/src/lib.rs

@@ -1,4 +1,4 @@
-//! Metrics Cardinality Limiter
+//! Relay Cardinality Module


Not necessarily for this PR, but since this is a new crate, we should take a look at what make doc-rust produces and whether we're happy with the documentation. Other crates have a summary of what they do in the crate-level docs, e.g.

relay/relay-metrics/src/lib.rs

Lines 1 to 64 in a9e7ef5

//! Metric protocol, aggregation and processing for Sentry.

//!

//! Metrics are high-volume values sent from Sentry clients, integrations, or extracted from errors

//! and transactions, that can be aggregated and queried over large time windows. As opposed to rich

//! errors and transactions, metrics carry relatively little context information in tags with low

//! cardinality.

//!

//! # Protocol

//!

//! Clients submit metrics in a [text-based protocol](Bucket) based on StatsD. See the [field

//! documentation](Bucket#fields) on `Bucket` for more information on the components. A sample

//! submission looks like this:

//!

//! ```text

#![doc = include_str!("../tests/fixtures/buckets.statsd.txt")]

//! ```

//!

//! The metric type is part of its signature just like the unit. Therefore, it is allowed to reuse a

//! metric name for multiple metric types, which will result in multiple metrics being recorded.

//!

//! # Metric Envelopes

//!

//! To send one or more metrics to Relay, the raw protocol is enclosed in an envelope item of type

//! `metrics`:

//!

//! ```text

//! {}

//! {"type": "statsd", ...}

#![doc = include_str!("../tests/fixtures/buckets.statsd.txt")]

//! ...

//! ```

//!

//! Note that the name format used in the statsd protocol is different from the MRI: Metric names

//! are not prefixed with `<ty>:` as the type is somewhere else in the protocol. If no metric

//! namespace is specified, the `"custom"` namespace is assumed.

//!

//! Optionally, a timestamp can be added to every line of the submitted envelope. The timestamp has

//! to be a valid Unix timestamp (UTC) and must be prefixed with `T`. If it is omitted, the

//! `received` time of the envelope is assumed.

//!

//! # Aggregation

//!

//! Relay accumulates all metrics in [time buckets](Bucket) before sending them onwards. Aggregation

//! is handled by the [`Aggregator`], which should be created once for the entire system. It flushes

//! aggregates in regular intervals, either shortly after their original time window has passed or

//! with a debounce delay for backdated submissions.

//!

//! **Warning**: With chained Relays submission delays accumulate.

//!

//! Aggregate buckets are encoded in JSON with the following schema:

//!

//! ```json

#![doc = include_str!("../tests/fixtures/buckets.json")]

//! ```

//!

//! # Ingestion

//!

//! Processing Relays write aggregate buckets into the ingestion Kafka stream. The schema is similar

//! to the aggregation payload, with the addition of scoping information. Each bucket is sent in a

//! separate message:

//!

//! ```json

#![doc = include_str!("../tests/fixtures/kafka.json")]

//! ```

.

I will definitely do this in a follow up, I also want to get a proper documentation in for how the limiter works!

relay-cardinality/Cargo.toml

relay-cardinality/src/limiter.rs

relay-config/src/config.rs

Dav1dde self-assigned this Nov 20, 2023

Dav1dde mentioned this pull request Nov 20, 2023

[EPIC] Move the metrics cardinality limiter to Relay #2717

Closed

Dav1dde added 3 commits December 4, 2023 08:56

feat(cardinality): Introduce new cardinality module

df8bb46

use existing scope struct

6aa2c67

redis feature

1d15143

Dav1dde force-pushed the feat/cardinality-limiter branch from 4ebfe3c to 1935241 Compare December 5, 2023 16:10

metrics, sliding window, cleanup, tests

ec47e21

Dav1dde force-pushed the feat/cardinality-limiter branch from 1935241 to ec47e21 Compare December 5, 2023 16:14

Dav1dde added 2 commits December 6, 2023 14:27

hook up cardinality limiter

792434a

use the feature flag

8ff556c

Dav1dde changed the title ~~feat(cardinality): Introduce new cardinality module~~ feat(cardinality): Implement Redis set based cardinality limiter Dec 6, 2023

Dav1dde added 2 commits December 6, 2023 15:40

Merge remote-tracking branch 'origin/master' into feat/cardinality-li…

a6c295b

…miter

non processing lints

8bc9bdb

Dav1dde force-pushed the feat/cardinality-limiter branch from b5b77f1 to 8bc9bdb Compare December 7, 2023 09:10

Dav1dde added 3 commits December 7, 2023 10:25

Merge remote-tracking branch 'origin/master' into feat/cardinality-li…

1c852cc

…miter

redis feature

992cfb4

config

41df0a0

Dav1dde commented Dec 7, 2023

View reviewed changes

Dav1dde marked this pull request as ready for review December 7, 2023 11:05

Dav1dde requested a review from a team as a code owner December 7, 2023 11:05

changelog

9eee3a5

jjbayer reviewed Dec 7, 2023

View reviewed changes

Dav1dde added 2 commits December 7, 2023 15:21

redis pipeline

763dd91

doc fixes

c6a90e9

Dav1dde commented Dec 7, 2023

View reviewed changes

relay-cardinality/src/limiter.rs Outdated Show resolved Hide resolved

jjbayer reviewed Dec 11, 2023

View reviewed changes

Dav1dde added 2 commits December 11, 2023 15:21

refactor cardinality item, track rejections

ed981b4

review changes

9121915

Dav1dde force-pushed the feat/cardinality-limiter branch from 90e2a3e to 9121915 Compare December 11, 2023 14:21

missing comment

8105721

Dav1dde added 3 commits December 11, 2023 16:09

add timer metric for cardinality limiter

fc23bc0

one big pipeline for all windows

837ebf9

clippy please

392dbcf

jjbayer approved these changes Dec 11, 2023

View reviewed changes

final cleanup

4d96acf

Dav1dde enabled auto-merge (squash) December 12, 2023 08:11

Dav1dde disabled auto-merge December 12, 2023 08:29

olksdr reviewed Dec 12, 2023

View reviewed changes

relay-cardinality/Cargo.toml Outdated Show resolved Hide resolved

relay-cardinality/src/limiter.rs Outdated Show resolved Hide resolved

relay-config/src/config.rs Show resolved Hide resolved

fix config, move cardinality item impl to relay-metrics

963dbf1

Dav1dde enabled auto-merge (squash) December 12, 2023 09:00

Dav1dde merged commit 82b8980 into master Dec 12, 2023
20 checks passed

Dav1dde deleted the feat/cardinality-limiter branch December 12, 2023 09:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cardinality): Implement Redis set based cardinality limiter #2745

feat(cardinality): Implement Redis set based cardinality limiter #2745

Dav1dde commented Nov 20, 2023 •

edited

Loading

Dav1dde Dec 7, 2023

jjbayer left a comment

jjbayer Dec 11, 2023

Dav1dde Dec 11, 2023

jjbayer Dec 11, 2023

Dav1dde Dec 11, 2023

jjbayer Dec 11, 2023

jjbayer left a comment

jjbayer Dec 11, 2023

Dav1dde Dec 12, 2023

		@@ -1,4 +1,4 @@
		//! Metrics Cardinality Limiter
		//! Relay Cardinality Module

	//! Metric protocol, aggregation and processing for Sentry.
	//!
	//! Metrics are high-volume values sent from Sentry clients, integrations, or extracted from errors
	//! and transactions, that can be aggregated and queried over large time windows. As opposed to rich
	//! errors and transactions, metrics carry relatively little context information in tags with low
	//! cardinality.
	//!
	//! # Protocol
	//!
	//! Clients submit metrics in a [text-based protocol](Bucket) based on StatsD. See the [field
	//! documentation](Bucket#fields) on `Bucket` for more information on the components. A sample
	//! submission looks like this:
	//!
	//! ```text
	#![doc = include_str!("../tests/fixtures/buckets.statsd.txt")]
	//! ```
	//!
	//! The metric type is part of its signature just like the unit. Therefore, it is allowed to reuse a
	//! metric name for multiple metric types, which will result in multiple metrics being recorded.
	//!
	//! # Metric Envelopes
	//!
	//! To send one or more metrics to Relay, the raw protocol is enclosed in an envelope item of type
	//! `metrics`:
	//!
	//! ```text
	//! {}
	//! {"type": "statsd", ...}
	#![doc = include_str!("../tests/fixtures/buckets.statsd.txt")]
	//! ...
	//! ```
	//!
	//! Note that the name format used in the statsd protocol is different from the MRI: Metric names
	//! are not prefixed with `<ty>:` as the type is somewhere else in the protocol. If no metric
	//! namespace is specified, the `"custom"` namespace is assumed.
	//!
	//! Optionally, a timestamp can be added to every line of the submitted envelope. The timestamp has
	//! to be a valid Unix timestamp (UTC) and must be prefixed with `T`. If it is omitted, the
	//! `received` time of the envelope is assumed.
	//!
	//! # Aggregation
	//!
	//! Relay accumulates all metrics in [time buckets](Bucket) before sending them onwards. Aggregation
	//! is handled by the [`Aggregator`], which should be created once for the entire system. It flushes
	//! aggregates in regular intervals, either shortly after their original time window has passed or
	//! with a debounce delay for backdated submissions.
	//!
	//! Warning: With chained Relays submission delays accumulate.
	//!
	//! Aggregate buckets are encoded in JSON with the following schema:
	//!
	//! ```json
	#![doc = include_str!("../tests/fixtures/buckets.json")]
	//! ```
	//!
	//! # Ingestion
	//!
	//! Processing Relays write aggregate buckets into the ingestion Kafka stream. The schema is similar
	//! to the aggregation payload, with the addition of scoping information. Each bucket is sent in a
	//! separate message:
	//!
	//! ```json
	#![doc = include_str!("../tests/fixtures/kafka.json")]
	//! ```

feat(cardinality): Implement Redis set based cardinality limiter #2745

feat(cardinality): Implement Redis set based cardinality limiter #2745

Conversation

Dav1dde commented Nov 20, 2023 • edited Loading

Choose a reason for hiding this comment

jjbayer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjbayer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dav1dde commented Nov 20, 2023 •

edited

Loading