Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metrics): Limits on bucketing cost in aggregator [INGEST-1132] #1287

Merged
merged 18 commits into from
Jun 7, 2022

Conversation

untitaker
Copy link
Member

Add new settings to enforce cost limits per project and process.

@untitaker untitaker requested a review from a team June 3, 2022 12:58
relay-metrics/src/aggregation.rs Show resolved Hide resolved
// XXX: This is not a great implementation of cost enforcement.
//
// * it takes two lookups of the project key in the cost tracker to merge a bucket: once in
// `check_limits_exceeded` and once in `add_cost`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try to return the error from add_cost and do the diff computation inline? The .entry() function below doesn't create the entry until you call insert on it at least, which just leaves the merge into the bucket, and that could also easily return whether it did insert or not.

It's excellent that you're calling out caveats of enforcement below, though in practice an off-by-one is not an issue.

Comment on lines +1330 to +1332
// Another consequence is that a MergeValue that adds zero cost (such as an existing
// counter bucket being incremented) is currently rejected even though it doesn't have to
// be.
Copy link
Member

@jan-auer jan-auer Jun 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is actually problematic and we should fix that. Same as in my previous comment -- if merge_into returns what happened, you should be able to enforce more consistently. See also #1284 (comment).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I call merge_into, it's already too late to enforce anything, as I can't potentially undo a merge. I would need to special-case counters or add a method to MergeValue that returns projected added cost under either occupied/vacant scenarios.

let's check in on monday

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's OK, since worst case you're off by one, so depending on the value you're inserting you're less than 48 bytes over. For a memory limit of 16GB that is insignificant. The other option is to implement cost on MetricValue.

Copy link
Member Author

@untitaker untitaker Jun 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's talk on monday, everything you're saying is true but I don't think it implies that one can fix this case (without impl-ing cost on MetricValue)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if implementing cost on MetricValue solves the problem. Whether for example a set bucket grows depends on the pre-existing set entries, not just the incoming value. So to be 100% consistent we would need a function that answers the question "would this increase my size?", which would be overkill IMO.

/// Maximum amount of bytes used for metrics aggregation per project.
///
/// Similar measuring technique to `max_total_bucket_bytes`, but instead of a
/// global/process-wide limit, it is enforced per project id.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// global/process-wide limit, it is enforced per project id.
/// global/process-wide limit, it is enforced per project key.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we still fix this and also use a better config name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What name would you suggest? max_bucket_bytes_total and max_bucket_bytes_per_project_key?

relay-metrics/src/aggregation.rs Show resolved Hide resolved
Comment on lines +1330 to +1332
// Another consequence is that a MergeValue that adds zero cost (such as an existing
// counter bucket being incremented) is currently rejected even though it doesn't have to
// be.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if implementing cost on MetricValue solves the problem. Whether for example a set bucket grows depends on the pre-existing set entries, not just the incoming value. So to be 100% consistent we would need a function that answers the question "would this increase my size?", which would be overkill IMO.

@untitaker untitaker requested review from jan-auer and jjbayer June 7, 2022 12:58
@untitaker
Copy link
Member Author

I think all comments should've been addressed now. We talked it through async and determined that consistent enforcement and accepting counter increments (because they don't have cost) is not worth the effort.

@untitaker untitaker changed the base branch from master to ref/track-metrics-footprint-2 June 7, 2022 14:38
Base automatically changed from ref/track-metrics-footprint-2 to master June 7, 2022 14:41
@untitaker untitaker enabled auto-merge (squash) June 7, 2022 14:49
@untitaker untitaker merged commit b59b6d5 into master Jun 7, 2022
@untitaker untitaker deleted the feat/metric-cost-enforcement branch June 7, 2022 14:59
jan-auer added a commit that referenced this pull request Jun 9, 2022
* master:
  ref(metrics): Stop logging relative bucket size (#1302)
  fix(metrics): Rename misnamed aggregator option (#1298)
  fix(server): Avoid a panic in the Sentry middleware (#1301)
  build: Update dependencies with known vulnerabilities (#1294)
  fix(metrics): Stop logging statsd metric per project key (#1295)
  feat(metrics): Limits on bucketing cost in aggregator [INGEST-1132] (#1287)
  fix(metrics): Track memory footprint more accurately (#1288)
  build(deps): Bump dependencies (#1293)
  feat(aws): Add relay-aws-extension crate which implements AWS extension as an actor (#1277)
  fix(meta): Update codeowners for the release actions (#1286)
  feat(metrics): Track memory footprint of metrics buckets (#1284)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants