-
Notifications
You must be signed in to change notification settings - Fork 805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ratelimiter usage-counting bugfix: rejected reservations were not counted #6158
Conversation
After noticing that we had rejected requests but no rejected global-limit numbers, I noticed this flaw. Basically it's just that not-allowed requests were not being counted in the fallback-limiter, so it only affected the global system. Since "counted" and "fallback" have separate implementations, and reserve is easier to mess up (as demonstrated) but rather important because it's heavily used in the frontend per-domain requests: I've added a test to check all three things to make sure they track calls like they should.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files
... and 7 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
@@ -74,21 +74,10 @@ func (c CountedLimiter) Wait(ctx context.Context) error { | |||
} | |||
|
|||
func (c CountedLimiter) Reserve() clock.Reservation { | |||
c.usage.idle.Store(0) // not idle regardless of the result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh I think "immediately not-idle" is more technically correct (that's when the token is reserved / it might not be returned by Used(false)
), but our use shouldn't really make a difference, and this way all Reserve()
impls are completely trivial.
res := b.both().Reserve() | ||
return countedReservation{ | ||
wrapped: res, | ||
wrapped: b.both().Reserve(), | ||
usage: &b.usage, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main flaw was here: it didn't pass through the Used(false) call when rejected, and did not count rejections.
it should have matched the counted-limiter's implementation. now they're both simple enough that it's fairly unlikely to diverge like this.
package internal_test | ||
package internal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly to share the allow-limiter. in principle I still like having this external because it doesn't need to be internal, but meh. it's rather minor.
After noticing that we had rejected requests but no rejected global-limit numbers, I noticed this flaw.
Basically it's just that not-allowed requests were not being counted in the fallback-limiter, so it only affected the global system.
Since "counted" and "fallback" have separate implementations, and reserve is easier to mess up (as demonstrated) but rather important because it's heavily used in the frontend per-domain requests: I've added a test to check all three things to make sure they track calls like they should.
As part of this, to share some types a bit more and to just normalize the tests a bit, I dropped the external-test-package part of the fallback-limiter test. AFAICT it was working fine, but it's definitely a bit of an abnormality in this codebase, and I'm not confident that it works correctly with coverage systems (I just haven't checked). Might as well simplify.
In terms of concrete changes, there are really only two here:
Used(..)
time, notReserve()
The rest is tests and minor refactoring to simplify it as much as possible.