Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ratelimiter usage-counting bugfix: rejected reservations were not counted #6158

Merged
merged 2 commits into from
Jul 5, 2024

Conversation

Groxx
Copy link
Member

@Groxx Groxx commented Jul 3, 2024

After noticing that we had rejected requests but no rejected global-limit numbers, I noticed this flaw.
Basically it's just that not-allowed requests were not being counted in the fallback-limiter, so it only affected the global system.

Since "counted" and "fallback" have separate implementations, and reserve is easier to mess up (as demonstrated) but rather important because it's heavily used in the frontend per-domain requests: I've added a test to check all three things to make sure they track calls like they should.

As part of this, to share some types a bit more and to just normalize the tests a bit, I dropped the external-test-package part of the fallback-limiter test. AFAICT it was working fine, but it's definitely a bit of an abnormality in this codebase, and I'm not confident that it works correctly with coverage systems (I just haven't checked). Might as well simplify.


In terms of concrete changes, there are really only two here:

  • fallback-limiter's Reserve should have called Used(false) and counted the rejection.
  • "not idle" is now done at Used(..) time, not Reserve()

The rest is tests and minor refactoring to simplify it as much as possible.

After noticing that we had rejected requests but no rejected global-limit numbers, I noticed this flaw.
Basically it's just that not-allowed requests were not being counted in the fallback-limiter, so it only affected the global system.

Since "counted" and "fallback" have separate implementations, and reserve is easier to mess up (as demonstrated) but rather important because it's heavily used in the frontend per-domain requests: I've added a test to check all three things to make sure they track calls like they should.
Copy link

codecov bot commented Jul 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.64%. Comparing base (1a227c1) to head (000ddee).

Additional details and impacted files
Files Coverage Δ
...ommon/quotas/global/collection/internal/counted.go 100.00% <100.00%> (ø)
...mmon/quotas/global/collection/internal/fallback.go 96.07% <100.00%> (ø)

... and 7 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1a227c1...000ddee. Read the comment docs.

@@ -74,21 +74,10 @@ func (c CountedLimiter) Wait(ctx context.Context) error {
}

func (c CountedLimiter) Reserve() clock.Reservation {
c.usage.idle.Store(0) // not idle regardless of the result
Copy link
Member Author

@Groxx Groxx Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I think "immediately not-idle" is more technically correct (that's when the token is reserved / it might not be returned by Used(false)), but our use shouldn't really make a difference, and this way all Reserve() impls are completely trivial.

Comment on lines -206 to 209
res := b.both().Reserve()
return countedReservation{
wrapped: res,
wrapped: b.both().Reserve(),
usage: &b.usage,
}
Copy link
Member Author

@Groxx Groxx Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main flaw was here: it didn't pass through the Used(false) call when rejected, and did not count rejections.

it should have matched the counted-limiter's implementation. now they're both simple enough that it's fairly unlikely to diverge like this.

package internal_test
package internal
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly to share the allow-limiter. in principle I still like having this external because it doesn't need to be internal, but meh. it's rather minor.

@Groxx Groxx changed the title Ratelimiter usage-counting bugfix: rejected reservations were skipped Ratelimiter usage-counting bugfix: rejected reservations were not counted Jul 5, 2024
@Groxx Groxx merged commit a6935a9 into cadence-workflow:master Jul 5, 2024
20 checks passed
@Groxx Groxx deleted the fix-counts branch July 5, 2024 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants