Global ratelimiter, part 1: core algorithm for computing weights #5689

Groxx · 2024-02-23T00:52:57Z

A high level overview is covered in common/quotas/global/doc.go, and this specific piece is further covered in the ./algorithm/requestweighted.go file.

At a very high level though, this is an isolated piece of a "deployment-wide load-balance-aware ratelimiter", intended to solve problems with our internal clusters with our current high-level frontend ratelimiters around client-triggered domain actions (start workflow, poll, etc).

This is the "logical core" of the whole system, with as few dependencies and concerns as I can manage. All frontend hosts that will be imposing limits eventually send the update arguments to an aggregating host, and the aggregating host largely just delegates to a single instance of this algorithm. Essentially all the aggregating host needs to do to respond is multiply the weight by each configured ratelimit (from dynamic config).

I've left the concurrency fairly coarse, both for simplicity and for computational speed. I don't believe this will be a bottleneck in even our largest cluster, but even if it is it should be fairly trivial to split each batch's ratelimit keys into N different sharded instances, which can be completely independent.

The problem

All of our current ratelimiters currently fall into two conceptual buckets:

they enforce N rps on that in-memory instance
they enforce N rps on that in-memory instance divided by the number of hosts in a ring

While these are very straightforward and have served us well, they're proving inadequate in our internal clusters due to imbalanced load on our frontend instances. Sometimes wildly imbalanced, e.g. 10% of hosts receiving all traffic for a domain.

What this leads to is a lower effective ratelimit than is intended, e.g.:

100 rps is allowed
10 hosts exist (so each host allows 10 rps)
1 host receives 100 rps, the others receive 0
the requests are incorrectly limited to 10rps
stuff slows down, oncalls get paged, nobody is happy

This is leading to a nasty scenario where we tell our users they get N rps, but they can't use it all, so we raise their limits to give them an effective N rps (actually like 3N configured)... but if their requests shift in balance we might allow 3N requests through, possibly risking cluster health. This is particularly risky for our largest users, as we lose our protection against significant un-planned load increases.

So we need to do something.

The solution

We can rather clearly divide solutions into either "don't send imbalanced load" or "share information about load", because it's not possible to make correct purely-local decisions. The first is not an option for us and the second has a variety of tradeoffs.

Within "share information" side we can also divide out two options:

Synchronously check on every request. This allows "perfect" limiting and likely simpler algorithms, but has a non-trivial latency cost (unsuitable for high frequency) and risk (another point of failure, particularly in edge cases or sensitive locations).
Something asynchronous. This is necessarily imprecise, but allows retries and puts no limits on limiter-usage frequency.

For just our frontend requests, we might be willing to go synchronous because the rates are relatively low and the proportional added latency would be very small... but we have imbalanced load in many other places, e.g. due to hot shards causing more requests against the database in a History host, which can reach 100,000 rps. Having a general tool seems useful, especially if it could cover all of these.

So this is part 1 of building option 2. Other parts, like "actually share data" and "configure all this", are coming in later PRs.

The end goal

Ultimately the goal is to have a system which will be ~90% accurate within ~10s regardless of changes, and "idle" at better than 99% for relatively stable / normal usage regardless of distribution. For this implementation, that implies roughly a 3s update cycle at 0.5 weight (each frontend host pushes aggregate data every 3s, and each update moves data to the average of the previous data and the new data).

Brief, large spikes are essentially out of scope - they will quite possibly perform worse with this system, particularly if they jump between hosts every couple seconds. These have caused fairly major load spike related issues in the past though, so we have no real interest in allowing them... and we have some async APIs being built that will queue and handle this more smoothly, which will hopefully be a better solution anyway.

Whether this exact implementation survives our internal use remains to be seen. The plan is to make different algorithms (and different data exchanged) relatively pluggable so we can experiment easily, once the framework is fully in place.

And last but not least, the existing N-rps and N-rps-divided-by-hosts ratelimiters are not going away, and this async-load-balancing limiter will be fully disable-able to go back to using them (and may even forever remain disabled by default). If your clusters do not have imbalanced request load, you might prefer to use the simpler ones, and they might even behave better. They are also useful as fallbacks for the global system, in case communication breaks down for some reason.

coveralls · 2024-02-23T01:19:11Z

Pull Request Test Coverage Report for Build 018e43b5-29d4-4581-9b85-2edf6e8f9656

Details

191 of 191 (100.0%) changed or added relevant lines in 1 file are covered.
81 unchanged lines in 16 files lost coverage.
Overall coverage increased (+0.02%) to 64.914%

Files with Coverage Reduction	New Missed Lines	%
service/history/task/transfer_standby_task_executor.go	2	87.01%
client/history/client.go	2	35.78%
service/history/handler/handler.go	2	49.59%
common/util.go	2	91.69%
common/membership/hashring.go	2	82.23%
common/task/fifo_task_scheduler.go	3	84.54%
common/persistence/nosql/nosqlplugin/cassandra/workflow.go	3	71.41%
common/persistence/statsComputer.go	3	93.57%
service/frontend/api/handler.go	4	62.11%
service/history/task/redispatcher.go	4	89.53%

Totals
Change from base Build 018e436a-a816-4303-af1c-1f41d4401a76:	0.02%
Covered Lines:	94753
Relevant Lines:	145968

💛 - Coveralls

tbd

common/quotas/global/algorithm/requestweighted_test.go

common/quotas/global/algorithm/requestweighted.go

common/quotas/global/doc.go

Shaddoll

So the assumption of this algorithm is that the requests may not be evenly distributed among the hosts, but the distribution is somewhat stable and does not fluctuate. Is that correct?

davidporter-id-au · 2024-02-23T06:10:51Z

I would add to your comment / update the implicit assumption we're making here is that it's history and the DB which is the thing which is vulnerable, and the frontend hosts themselves rarely need as much protection, so by sharing information around we are attempting to achieve fairness to the customer while protecting the vulnerable thing

Groxx · 2024-02-23T18:40:39Z

So the assumption of this algorithm is that the requests may not be evenly distributed among the hosts, but the distribution is somewhat stable and does not fluctuate. Is that correct?

"does not continually fluctuate significantly on a second-by-second basis", yeah. It's assumed to change while our servers are are running (callers come and go and change locations), but more on the order of "minutes or longer" rather than seconds.

which matches how load balancers and network planes generally work - they don't generally send a flood to one host one second, then a flood to a second host the next, etc. most attempt to evenly distribute load and are closer to round-robin or random, and at best you might see something like "send to least-requests-in-flight". but even least-requests won't send all long requests to one host and all short requests to a different one, it should still be roughly randomly distributed and so roughly stable over a longish period of time.
(how long depends on a lot, which is part of why these values are configurable)

load balancers dealing with large numbers do tend to only know about a subset of hosts though (forming a tree or incomplete graph between caller/callees, basically) which is essentially what's happening to us internally. to some degree all along, and to a MUCH greater degree recently.

and internally, ^ that kind of load balance change really only occurs at around the rate of deploys or datacenter re-balancing, which is very roughly: less than hourly and takes multiple minutes to complete, as minimum bounds (many are much less frequent and much slower). as long as it takes something like a minute or less to restore ideal behavior, that should be fine, and I expect this to be a few times faster in practice (some change in ~3s, near-ideal in ~10s).

if someone is using a routing system that doesn't act like ^ this, then yea IMO it's out of scope for this implementation. ideally, handling that would be "come up with a way to handle it, and plug it in as a different algorithm / different communicated data", but no doubt we'll discover we need some small changes. but that'll be entirely outside this PR's scope, as this is just the one set of assumptions.

^ and thanks for asking, clearly I should put something like this in the files, it's definitely relevant for the long term.

common/quotas/global/algorithm/requestweighted.go

common/quotas/global/algorithm/requestweighted_test.go

common/quotas/global/algorithm/requestweighted.go

common/quotas/global/algorithm/requestweighted_test.go

common/quotas/global/algorithm/requestweighted.go

common/quotas/global/doc.go

common/quotas/global/algorithm/requestweighted.go

common/quotas/global/doc.go

common/quotas/global/algorithm/requestweighted.go

common/quotas/global/doc.go

3vilhamster · 2024-02-29T21:46:23Z

Not sure if you've tried/looked, but I've recently found https://github.com/mennanov/limiters.
It supports multiple storage engines and has different implementations already.

That comes with some limitations of external db requirements, but it might be easier.
But that will support any kind of bursty traffic, given that we don't hit the limit of db itself.

davidporter-id-au · 2024-02-28T01:57:01Z

common/quotas/global/algorithm/requestweighted.go

+update cycle, so this is mostly intended as a tool for reducing incorrectly-rejected
+requests when a ratelimit's usage is well below its allowed limit.
+
+# Dealing with expired data


It is obvious after discussing in person, but I was initially confused about host-level granularity being persisted because - in the back of my mind - the problem you were solving was zonal imbalances.

Obviously since you don't aggregate at the zone level you need to do it at the host level, and therefore you have this GC problem, but that didn't go into my brain the first instance I looked at it.

yeah, the final goal for all this is to let us experiment internally while reusing the overall sharding / request flow / etc, since internally we have information that doesn't exist in other environments.

which is probably true of other environments too. "customized" seems quite likely in general, though I do expect a majority will just have round-robin or random or something and won't need this.

common/quotas/global/algorithm/requestweighted.go

taylanisikdemir

Changes LGTM and we will probably iterate further as rest of the layers that depend on this get implemented. Only leftover comment from my side is MaxDelayDuration vs UpdateRate which would improve reasonability of the algo implementation.

…s and docs to cover new code / describe new behavior

…story service and workflow histories and whatnot

Groxx · 2024-03-05T20:02:56Z

Not sure if you've tried/looked, but I've recently found https://github.com/mennanov/limiters. It supports multiple storage engines and has different implementations already.

That comes with some limitations of external db requirements, but it might be easier. But that will support any kind of bursty traffic, given that we don't hit the limit of db itself.

I did finally check that + the code, and AFAICT they all require a request per limit check / they're synchronous checks, not asynchronous.

Might be worth checking in more detail if we decide to toss this and use an external storage and hammer it super hard, but that's not really the goal at the moment.

codecov · 2024-03-15T18:52:37Z

Codecov Report

Merging #5689 (f84e9ff) into master (9cd9109) will increase coverage by 0.03%.
The diff coverage is 100.00%.

❗ Current head f84e9ff differs from pull request most recent head 3fb97e3. Consider uploading reports for the commit 3fb97e3 to get more accurate results

Additional details and impacted files

Files	Coverage Δ
common/quotas/global/algorithm/requestweighted.go	`100.00% <100.00%> (ø)`

... and 7 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9cd9109...3fb97e3. Read the comment docs.

Global ratelimiter, part 1: core algorithm for computing weights

11ec7a8

tbd

Groxx force-pushed the global_aggregator branch from a63fc02 to 11ec7a8 Compare February 23, 2024 01:31

Groxx commented Feb 23, 2024

View reviewed changes

common/quotas/global/algorithm/requestweighted_test.go Outdated Show resolved Hide resolved

Groxx commented Feb 23, 2024

View reviewed changes

common/quotas/global/algorithm/requestweighted.go Show resolved Hide resolved

Shaddoll reviewed Feb 23, 2024

View reviewed changes

common/quotas/global/algorithm/requestweighted.go Outdated Show resolved Hide resolved

Shaddoll reviewed Feb 23, 2024

View reviewed changes

common/quotas/global/algorithm/requestweighted.go Outdated Show resolved Hide resolved

Groxx marked this pull request as ready for review February 23, 2024 03:56

Groxx requested review from neil-xie, davidporter-id-au, shijiesheng, agautam478, jakobht, 3vilhamster, sankari165, dkrotx, taylanisikdemir and demirkayaender as code owners February 23, 2024 03:56

Shaddoll reviewed Feb 23, 2024

View reviewed changes

common/quotas/global/algorithm/requestweighted.go Outdated Show resolved Hide resolved

Shaddoll reviewed Feb 23, 2024

View reviewed changes

common/quotas/global/doc.go Show resolved Hide resolved

Shaddoll reviewed Feb 23, 2024

View reviewed changes

taylanisikdemir reviewed Feb 23, 2024

View reviewed changes

common/quotas/global/algorithm/requestweighted.go Show resolved Hide resolved

davidporter-id-au reviewed Feb 24, 2024

View reviewed changes

docs docs docs

dea45eb

davidporter-id-au reviewed Feb 24, 2024

View reviewed changes

common/quotas/global/doc.go Show resolved Hide resolved

Groxx commented Feb 25, 2024

View reviewed changes

common/quotas/global/doc.go Show resolved Hide resolved

Groxx commented Feb 25, 2024

View reviewed changes

common/quotas/global/doc.go Outdated Show resolved Hide resolved

minor doc changes

881af7a

Groxx commented Feb 25, 2024

View reviewed changes

common/quotas/global/algorithm/requestweighted.go Show resolved Hide resolved

minor cfg-value-type note

f48ddec

davidporter-id-au reviewed Feb 26, 2024

View reviewed changes

common/quotas/global/doc.go Outdated Show resolved Hide resolved

new diagram, fixed 1-weight, some cleanup, better coverage. not done.

af42b5d

Groxx commented Feb 26, 2024

View reviewed changes

common/quotas/global/doc.go Show resolved Hide resolved

Groxx added 2 commits February 26, 2024 14:56

formatted

68a6a4c

minor docs reorg

659b2b4

davidporter-id-au approved these changes Mar 1, 2024

View reviewed changes

taylanisikdemir approved these changes Mar 4, 2024

View reviewed changes

Groxx added 6 commits March 4, 2024 14:45

minor clarification on sequence diagram

17e4304

split gc and missed-update handling, moved to durations, updated test…

6d2e1b9

…s and docs to cover new code / describe new behavior

renaming history to requests to avoid the cognitive collision with hi…

752b21d

…story service and workflow histories and whatnot

Merge branch 'master' into global_aggregator

6f75676

Merge branch 'master' into global_aggregator

a323199

Merge branch 'master' into global_aggregator

23d248a

Merge branch 'master' into global_aggregator

f84e9ff

Merge branch 'master' into global_aggregator

3fb97e3

Groxx enabled auto-merge (squash) March 15, 2024 20:06

Groxx merged commit 3091e41 into master Mar 15, 2024
19 checks passed

Groxx deleted the global_aggregator branch March 15, 2024 20:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Global ratelimiter, part 1: core algorithm for computing weights #5689

Global ratelimiter, part 1: core algorithm for computing weights #5689

Groxx commented Feb 23, 2024 •

edited

Loading

coveralls commented Feb 23, 2024 •

edited

Loading

Shaddoll left a comment

davidporter-id-au commented Feb 23, 2024

Groxx commented Feb 23, 2024 •

edited

Loading

3vilhamster commented Feb 29, 2024

davidporter-id-au Feb 28, 2024

Groxx Mar 4, 2024

taylanisikdemir left a comment

Groxx commented Mar 5, 2024 •

edited

Loading

codecov bot commented Mar 15, 2024 •

edited

Loading

Global ratelimiter, part 1: core algorithm for computing weights #5689

Global ratelimiter, part 1: core algorithm for computing weights #5689

Conversation

Groxx commented Feb 23, 2024 • edited Loading

The problem

The solution

The end goal

coveralls commented Feb 23, 2024 • edited Loading

Pull Request Test Coverage Report for Build 018e43b5-29d4-4581-9b85-2edf6e8f9656

Details

💛 - Coveralls

Shaddoll left a comment

Choose a reason for hiding this comment

davidporter-id-au commented Feb 23, 2024

Groxx commented Feb 23, 2024 • edited Loading

3vilhamster commented Feb 29, 2024

davidporter-id-au Feb 28, 2024

Choose a reason for hiding this comment

Groxx Mar 4, 2024

Choose a reason for hiding this comment

taylanisikdemir left a comment

Choose a reason for hiding this comment

Groxx commented Mar 5, 2024 • edited Loading

codecov bot commented Mar 15, 2024 • edited Loading

Codecov Report

Groxx commented Feb 23, 2024 •

edited

Loading

coveralls commented Feb 23, 2024 •

edited

Loading

Groxx commented Feb 23, 2024 •

edited

Loading

Groxx commented Mar 5, 2024 •

edited

Loading

codecov bot commented Mar 15, 2024 •

edited

Loading