feat(metrics): Limits on bucketing cost in aggregator [INGEST-1132] #1287

untitaker · 2022-06-03T12:58:27Z

Add new settings to enforce cost limits per project and process.

relay-metrics/src/aggregation.rs

jan-auer · 2022-06-03T13:04:40Z

relay-metrics/src/aggregation.rs

+        // XXX: This is not a great implementation of cost enforcement.
+        //
+        // * it takes two lookups of the project key in the cost tracker to merge a bucket: once in
+        //   `check_limits_exceeded` and once in `add_cost`.


Could you try to return the error from add_cost and do the diff computation inline? The .entry() function below doesn't create the entry until you call insert on it at least, which just leaves the merge into the bucket, and that could also easily return whether it did insert or not.

It's excellent that you're calling out caveats of enforcement below, though in practice an off-by-one is not an issue.

jan-auer · 2022-06-03T13:06:49Z

relay-metrics/src/aggregation.rs

+        //   Another consequence is that a MergeValue that adds zero cost (such as an existing
+        //   counter bucket being incremented) is currently rejected even though it doesn't have to
+        //   be.


That is actually problematic and we should fix that. Same as in my previous comment -- if merge_into returns what happened, you should be able to enforce more consistently. See also #1284 (comment).

If I call merge_into, it's already too late to enforce anything, as I can't potentially undo a merge. I would need to special-case counters or add a method to MergeValue that returns projected added cost under either occupied/vacant scenarios.

let's check in on monday

That's OK, since worst case you're off by one, so depending on the value you're inserting you're less than 48 bytes over. For a memory limit of 16GB that is insignificant. The other option is to implement cost on MetricValue.

Let's talk on monday, everything you're saying is true but I don't think it implies that one can fix this case (without impl-ing cost on MetricValue)

Not sure if implementing cost on MetricValue solves the problem. Whether for example a set bucket grows depends on the pre-existing set entries, not just the incoming value. So to be 100% consistent we would need a function that answers the question "would this increase my size?", which would be overkill IMO.

jjbayer · 2022-06-07T07:35:55Z

relay-metrics/src/aggregation.rs

+    /// Maximum amount of bytes used for metrics aggregation per project.
+    ///
+    /// Similar measuring technique to `max_total_bucket_bytes`, but instead of a
+    /// global/process-wide limit, it is enforced per project id.


Suggested change

/// global/process-wide limit, it is enforced per project id.

/// global/process-wide limit, it is enforced per project key.

Can we still fix this and also use a better config name?

What name would you suggest? max_bucket_bytes_total and max_bucket_bytes_per_project_key?

relay-metrics/src/aggregation.rs

jjbayer · 2022-06-07T07:46:21Z

relay-metrics/src/aggregation.rs

+        //   Another consequence is that a MergeValue that adds zero cost (such as an existing
+        //   counter bucket being incremented) is currently rejected even though it doesn't have to
+        //   be.


Not sure if implementing cost on MetricValue solves the problem. Whether for example a set bucket grows depends on the pre-existing set entries, not just the incoming value. So to be 100% consistent we would need a function that answers the question "would this increase my size?", which would be overkill IMO.

relay-metrics/src/aggregation.rs

untitaker · 2022-06-07T12:59:00Z

I think all comments should've been addressed now. We talked it through async and determined that consistent enforcement and accepting counter increments (because they don't have cost) is not worth the effort.

relay-server/src/actors/healthcheck.rs

…forcement

* master: ref(metrics): Stop logging relative bucket size (#1302) fix(metrics): Rename misnamed aggregator option (#1298) fix(server): Avoid a panic in the Sentry middleware (#1301) build: Update dependencies with known vulnerabilities (#1294) fix(metrics): Stop logging statsd metric per project key (#1295) feat(metrics): Limits on bucketing cost in aggregator [INGEST-1132] (#1287) fix(metrics): Track memory footprint more accurately (#1288) build(deps): Bump dependencies (#1293) feat(aws): Add relay-aws-extension crate which implements AWS extension as an actor (#1277) fix(meta): Update codeowners for the release actions (#1286) feat(metrics): Track memory footprint of metrics buckets (#1284)

jjbayer and others added 5 commits June 3, 2022 11:35

ref: Use type aliases and constants in cost model

585ded8

ref: Use HashMap for cost tracker

aec39ce

fix: Respect size_of(BucketValue) for every bucket value

734d952

fix: Add cost for bucket keys

b0c5a95

feat(metrics): Limits on bucketing cost in aggregator [INGEST-1132]

f2952c9

Add new settings to enforce cost limits per project and process.

untitaker requested a review from a team June 3, 2022 12:58

jan-auer reviewed Jun 3, 2022

View reviewed changes

jjbayer added 2 commits June 3, 2022 15:08

fix: test

100f81f

PR self-review

66c19d5

jan-auer mentioned this pull request Jun 3, 2022

fix(metrics): Track memory footprint more accurately [INGEST-1132] #1288

Merged

untitaker and others added 4 commits June 3, 2022 20:10

implement healthcheck

8497028

ref: Use size_of in cost model, hard code values in test

975f1b9

ref: Use type aliases for metric types

0a70239

ref: Change fmt::Debug for CostTracker

5764c12

jjbayer reviewed Jun 7, 2022

View reviewed changes

relay-metrics/src/aggregation.rs Show resolved Hide resolved

add extra data to error

404f428

untitaker requested review from jan-auer and jjbayer June 7, 2022 12:58

jjbayer reviewed Jun 7, 2022

View reviewed changes

relay-server/src/actors/healthcheck.rs Show resolved Hide resolved

jjbayer approved these changes Jun 7, 2022

View reviewed changes

untitaker and others added 4 commits June 7, 2022 16:10

add changelog

dcfa4bd

ref: Call subtract_cost twice

8f82611

fix clippy

48f7751

fix clippy

bf25451

untitaker changed the base branch from master to ref/track-metrics-footprint-2 June 7, 2022 14:38

Merge branch 'ref/track-metrics-footprint-2' into feat/metric-cost-en…

64bc999

…forcement

Base automatically changed from ref/track-metrics-footprint-2 to master June 7, 2022 14:41

Merge branch 'master' into feat/metric-cost-enforcement

ebdbc86

untitaker enabled auto-merge (squash) June 7, 2022 14:49

untitaker merged commit b59b6d5 into master Jun 7, 2022

untitaker deleted the feat/metric-cost-enforcement branch June 7, 2022 14:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(metrics): Limits on bucketing cost in aggregator [INGEST-1132] #1287

feat(metrics): Limits on bucketing cost in aggregator [INGEST-1132] #1287

untitaker commented Jun 3, 2022

jan-auer Jun 3, 2022

jan-auer Jun 3, 2022 •

edited

Loading

untitaker Jun 3, 2022

jan-auer Jun 3, 2022

untitaker Jun 3, 2022 •

edited

Loading

jjbayer Jun 7, 2022

jjbayer Jun 7, 2022

jan-auer Jun 8, 2022

jjbayer Jun 8, 2022

jjbayer Jun 7, 2022

untitaker commented Jun 7, 2022

	/// global/process-wide limit, it is enforced per project id.
	/// global/process-wide limit, it is enforced per project key.

feat(metrics): Limits on bucketing cost in aggregator [INGEST-1132] #1287

feat(metrics): Limits on bucketing cost in aggregator [INGEST-1132] #1287

Conversation

untitaker commented Jun 3, 2022

Choose a reason for hiding this comment

jan-auer Jun 3, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

untitaker Jun 3, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

untitaker commented Jun 7, 2022

jan-auer Jun 3, 2022 •

edited

Loading

untitaker Jun 3, 2022 •

edited

Loading