feat(relay): Implement factor based multi-matching #1790

iambriccardo · 2023-01-30T07:51:59Z

This PR implements a new sampling algorithm that has two new major changes, support for factors and multi-matching.

The rationale behind this new algorithm is to enable the interpolation of different matching rules due to the increasing number of biases created on the sentry side (e.g., we want to mix the latest release and team key transaction biases together). With the current system, this was not possible, because matching was performed in a FIFO fashion.

This algorithm can result in inconsistent trace sampling, which is a drawback we decided to accept. The algorithm does perform inconsistent sampling because it will take into consideration both transaction and trace rules while computing the final sample rate.

The algorithm revolves around two configurations:

root config -> contains the rules of the head of the trace to which the incoming event belongs.
non-root config -> contains the rules of the project to which the incoming event belongs.

It also introduces the concept of samplingValue. A samplingValue is nothing more than a type of heuristic we apply to sampling, which has a value. The two types we currently have implemented are:

factor -> the factor which the algorithm will apply on a matched sample rate.
sampleRate -> the classic sample rate.

The new algorithm works in the following way:

It merges the sampling rules of the root and non-root projects, concatenating first the transaction/error rules of the non-root project and then the trace rules of the root project.
It performs matching from top to bottom with the following approach:
If a factor rule is matched, it will take the factor and multiply it with past accumulated factors (starting with default 1).
If a sampleRate rule is matched, it will take the accumulated factors and multiply that by the sampleRate of the matched rule.

The matching process will return the final sampleRate that will be used to decide whether to keep or not keep the event. In case no matches or unsupported rules are found, the algorithm will return a no match, which will mean that we will keep the event (we prefer to oversample than drop events).

jjbayer

This looks good, mostly nitpicks at this stage. I did not look at the tests yet.

jjbayer · 2023-02-08T09:45:40Z

py/tests/test_processing.py

@@ -242,7 +242,10 @@ def test_validate_sampling_configuration():
        "rules": [


Shouldn't this be rulesV2 now?

Yep, i didn't catch it as CI was blocked and integration tests weren't run again. Will fix!

jjbayer · 2023-02-08T09:50:57Z

relay-sampling/src/lib.rs

+        match self.sampling_value {
+            SamplingValue::SampleRate { value: _ } => true,
+            SamplingValue::Factor { value: _ } => false,
+        }


Suggested change

match self.sampling_value {

SamplingValue::SampleRate { value: _ } => true,

SamplingValue::Factor { value: _ } => false,

}

matches!(self.sampling_value, SamplingValue::SampleRate { _ })

jjbayer · 2023-02-08T09:51:57Z

relay-sampling/src/lib.rs

+    fn get_sampling_base_value(&self) -> f64 {
+        match self.sampling_value {
+            SamplingValue::SampleRate { value: sample_rate } => sample_rate,
+            SamplingValue::Factor { value: factor } => factor,
+        }
+    }


nit: Can we put this helper in impl SamplingValue?

jjbayer · 2023-02-08T10:01:04Z

relay-sampling/src/lib.rs

+    /// The ordered sampling rules for the project.
+    #[serde(default, skip_deserializing)]


Suggested change

/// The ordered sampling rules for the project.

#[serde(default, skip_deserializing)]

/// Legacy sampling rules.

///

/// We need to serialize an empty list of rules such that outdated downstream Relays

/// accept the config, but at the same time, we do not want to actually use these rules

/// in the current Relay.

#[serde(default, skip_deserializing)]

jjbayer · 2023-02-08T10:33:45Z

relay-sampling/src/lib.rs

+    #[serde(default, skip_deserializing)]
    pub rules: Vec<SamplingRule>,
+    /// The ordered sampling rules v2 for the project.
+    pub rules_v2: Vec<SamplingRule>,


This might make the protocol a bit confusing, but I think it makes the rest of the code / diff easier to read. Feel free to disagree.

Suggested change

#[serde(default, skip_deserializing)]

pub rules: Vec<SamplingRule>,

/// The ordered sampling rules v2 for the project.

pub rules_v2: Vec<SamplingRule>,

#[serde(default, skip_deserializing, rename="rules")]

pub legacy_rules: Vec<SamplingRule>,

/// The ordered sampling rules v2 for the project.

#[serde(rename = "rulesV2")]

pub rules: Vec<SamplingRule>,

I would prefer to keep the rules_v2 schema, to be more explicit about the newer version, like it is typically done in APIs. The reasoning for avoiding legacy is that if we upgrade again the schema, we will have to define legacy_legacy? I envision a simpler naming scheme with rules_v*. Ofc the best solution would be to implement proper versioning.

jjbayer · 2023-02-08T10:38:45Z

relay-sampling/src/lib.rs

+
+/// Represents the specification for sampling an incoming event.
+#[derive(Clone, Debug, PartialEq)]
+pub struct SamplingConfigMatchResult {


nit: I would rename this to SamplingMatch.

relay-sampling/src/lib.rs

jjbayer · 2023-02-08T10:41:49Z

relay-sampling/src/lib.rs

+        if let Some(event_id) = event_id {
+            return Some(event_id.0);
+        }


Suggested change

if let Some(event_id) = event_id {

return Some(event_id.0);

}

event_id.map(|id| id.0)

jjbayer · 2023-02-08T10:44:47Z

relay-sampling/src/lib.rs

+        None
+    }
+
+    pub fn has_unsupported_rules(&self) -> bool {


Can we move this function back to the top? Just to make the diff easier to read.

jjbayer · 2023-02-08T10:52:53Z

relay-sampling/src/lib.rs

-        };
+    ) -> Option<SamplingConfigMatchResult> {
+        let mut matched_rule_ids = vec![];
+        let mut has_matched_trace_rule = false;


Instead of storing a boolean and using get_seed, I would just

let mut seed = event.id.value();

and then overwrite it with the DSC seed further down if necessary.

I made my implementation under the assumption of having trace and transaction rules mixed, as shown in the tests. For this reason I use a boolean to keep track of any seen trace.

What I mean is that on line 1000, instead of flipping a boolean, you could overwrite the seed directly.

Oh yeah sorry, I get now what you mean. Will do that, sorry for the confusion.

jjbayer · 2023-02-08T15:11:22Z

relay-sampling/src/lib.rs

@@ -374,6 +374,15 @@ pub enum SamplingValue {
    Factor { value: f64 },
 }

+impl SamplingValue {
+    fn get_sampling_base_value(&self) -> f64 {


nit: Our getters usually do not have a get_ prefix (see also rust docs). Should we just call this value()?

jjbayer · 2023-02-08T15:15:29Z

relay-sampling/src/lib.rs

-        };
+    ) -> Option<SamplingConfigMatchResult> {
+        let mut matched_rule_ids = vec![];
+        let mut has_matched_trace_rule = false;


What I mean is that on line 1000, instead of flipping a boolean, you could overwrite the seed directly.

jjbayer · 2023-02-08T15:36:34Z

relay-server/src/utils/dynamic_sampling.rs

+    let merged_config = SamplingConfigs::new(sampling_config)
+        .add_root_config(root_project_state)
+        .get_merged_config();


We should not clone configs / create new vectors for every event here. Ideally, we operate on a get_merged_rules() function that returns an iterator (not a vector) of rules.

I wanted to discuss this indeed, I can try to implement the approach with iterators.

jjbayer · 2023-02-08T15:38:11Z

relay-server/src/utils/dynamic_sampling.rs

+    Match {
+        sample_rate: f64,
+        matched_rule_ids: MatchedRuleIds,
+        seed: Uuid,
+    },


This looks very similar to the SamplingMatch defined in relay-sampling. Can we just use an Option<SamplingMatch> instead of this enum?

They look similar but we actually change the sample rate, I would prefer to keep this Match and NoMatch for expressiveness and I can use the SamplingMatch inside the Match variant of the enum.

# Conflicts: # relay-server/src/utils/dynamic_sampling.rs

jjbayer

I think we are close to merging this. Please take a look at the remaining comments. Could you also write a PR description that includes the motivation for this change and a summary of the change itself?

relay-sampling/src/lib.rs

relay-server/src/utils/dynamic_sampling.rs

relay-sampling/src/lib.rs

# Conflicts: # Cargo.lock # relay-sampling/src/lib.rs

jjbayer

Thanks for contributing! Final notes:

Do not merge until feat(dynamic-sampling): Add support for rules v2 in Sentry [TET-666] sentry#44229 has landed (as discussed). Otherwise, Relay will fail to parse project configs because of the missing rulesV2 field.
Please update the PR description (see docs).
There's some new deprecation warnings, but I assume they are easy to fix.

iambriccardo · 2023-02-13T09:26:02Z

Thanks for contributing! Final notes:

Do not merge until getsentry/sentry#44229 has landed (as discussed). Otherwise, Relay will fail to parse project configs because of the missing rulesV2 field.

Please update the PR description (see docs).

There's some new deprecation warnings, but I assume they are easy to fix.

I will have also to write the changelog.

* master: feat(relay): Implement factor based multi-matching [TET-665] (#1790) fix(rate-limits): Rate limit profiles attached to transactions (#1843) fix(txnames): Avoid replacing url encoded transaction names (#1839) meta: Remove cargo vet (#1841)

iambriccardo added 21 commits January 30, 2023 08:51

feat(relay): Implement factor based multi-matching

ee1e1d0

Improve code

f3fc373

Add new sampling algorithm

b731c30

Fix clippy

b2a8c02

Add first implementation of configuration merging

a341eae

Add tests and improve logic

3686e7f

Improve tests

bc4b05d

Fix warnings

8f4e0f7

Improve tests

46a5756

Remove override

b31d545

Improve code

6d45744

Merge branch 'master' into riccardo/feat/multi-matching-and-factors

9025a30

Add clarified todos

a308493

Improve code

de3b00f

Add rule ids match

bd94d6e

Improve code

1987baa

Cleanup

85b3dee

Cleanup

c2e8dc7

Fix some integration tests

412ba23

Fix tests

1d60c9a

Fix tests

03237db

iambriccardo requested review from jjbayer and jan-auer February 3, 2023 12:40

iambriccardo marked this pull request as ready for review February 3, 2023 12:40

iambriccardo requested a review from a team February 3, 2023 12:40

iambriccardo added 5 commits February 3, 2023 13:45

Use assert_eq from similar asserts

306e53d

Improve tests

c492ac4

Improve tests

5805180

Improve tests

cc571c8

Improve tests

cf19980

iambriccardo added 5 commits February 7, 2023 09:42

Add support for rules v2

6810e04

Fix integration tests

a3281e0

Fix deserialization

9d9eb07

Add comment

80fa369

Add test

4ea39be

jjbayer requested changes Feb 8, 2023

View reviewed changes

iambriccardo changed the title ~~feat(relay): Implement factor based multi-matching~~ feat(relay): Implement factor based multi-matching [TET-665] Feb 8, 2023

Fix PR comments

053a08d

iambriccardo requested a review from jjbayer February 8, 2023 14:53

jjbayer requested changes Feb 8, 2023

View reviewed changes

iambriccardo and others added 5 commits February 9, 2023 08:38

Improve code

5ac2cff

Improve code

f5d626f

Refactor

32d2377

Merge branch 'master' into riccardo/feat/multi-matching-and-factors

8f5636b

# Conflicts: # relay-server/src/utils/dynamic_sampling.rs

ref: simplify

14522ac

jjbayer requested changes Feb 10, 2023

View reviewed changes

iambriccardo added 3 commits February 10, 2023 15:07

Improve pr

1784ba7

Merge branch 'master' into riccardo/feat/multi-matching-and-factors

fb1faea

# Conflicts: # Cargo.lock # relay-sampling/src/lib.rs

Fix old dependency

1a7c78c

jjbayer approved these changes Feb 10, 2023

View reviewed changes

Fix tests

f2c7859

iambriccardo added 2 commits February 13, 2023 10:32

Add changelog

599f571

Improve logging

e33d6db

iambriccardo mentioned this pull request Feb 13, 2023

feat(dynamic-sampling): Implement dynamic factor function and new factor-based rules [TET-702] getsentry/sentry#44500

Merged

Add test

cee2258

iambriccardo merged commit f92c968 into master Feb 14, 2023

iambriccardo deleted the riccardo/feat/multi-matching-and-factors branch February 14, 2023 08:18

iambriccardo changed the title ~~feat(relay): Implement factor based multi-matching [TET-665]~~ feat(relay): Implement factor based multi-matching Feb 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(relay): Implement factor based multi-matching #1790

feat(relay): Implement factor based multi-matching #1790

iambriccardo commented Jan 30, 2023 •

edited

Loading

jjbayer left a comment

jjbayer Feb 8, 2023

iambriccardo Feb 8, 2023

jjbayer Feb 8, 2023

iambriccardo Feb 8, 2023

jjbayer Feb 8, 2023

jjbayer Feb 8, 2023

jjbayer Feb 8, 2023

iambriccardo Feb 8, 2023

jjbayer Feb 8, 2023

jjbayer Feb 8, 2023

jjbayer Feb 8, 2023

jjbayer Feb 8, 2023

iambriccardo Feb 8, 2023

jjbayer Feb 8, 2023

iambriccardo Feb 8, 2023

jjbayer Feb 8, 2023

jjbayer Feb 8, 2023

jjbayer Feb 8, 2023

iambriccardo Feb 8, 2023

jjbayer Feb 8, 2023

iambriccardo Feb 9, 2023

jjbayer left a comment

jjbayer left a comment

iambriccardo commented Feb 13, 2023

		@@ -242,7 +242,10 @@ def test_validate_sampling_configuration():
		"rules": [

		/// The ordered sampling rules for the project.
		#[serde(default, skip_deserializing)]

-    /// The ordered sampling rules for the project.
-    #[serde(default, skip_deserializing)]
+    /// Legacy sampling rules.
+    ///
+    /// We need to serialize an empty list of rules such that outdated downstream Relays
+    /// accept the config, but at the same time, we do not want to actually use these rules
+    /// in the current Relay.
+    #[serde(default, skip_deserializing)]

feat(relay): Implement factor based multi-matching #1790

feat(relay): Implement factor based multi-matching #1790

Conversation

iambriccardo commented Jan 30, 2023 • edited Loading

jjbayer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjbayer left a comment

Choose a reason for hiding this comment

jjbayer left a comment

Choose a reason for hiding this comment

iambriccardo commented Feb 13, 2023

iambriccardo commented Jan 30, 2023 •

edited

Loading