-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return rate limit error according to ingestion rate strategy #3485
Conversation
When running main on my local, I noticed that When we fetch the overrides here I believe because the overrides hadn’t start up yet this function would return the defaults override which does have the ingestion rate strategy. But then we call the same function here the overrides is up and running, using “” here actually returns a tenant specific overrides which does not contain the ingestion rate strategy. @kvrhdn could you please double check my logic |
@ie-pham What do you think about including both the local and global rate limits in the error message? My concern is the cases where a tenant exceeds the local limit, but overall traffic is still below the global limit. Catching and explaining these can be tricky. The current error message with the local limit makes it more apparent, and also captures the number of distributor replicas at the time (indirectly). |
It seems the IngestionRateStrategy has always been a defaults overrides. It isn't part of the distributor config at all. Not sure if we want to spend time adding it or leave it as is. |
I would be in favour of adding this setting to distributor config and mark the override as deprecated. It's currently really not working as you would expect from other runtime overrides. Issues:
# overrides configmap
overrides:
"*":
ingestion:
rate_strategy: global
"my-tenant":
ingestion:
rate_strategy: global
# ...
Deprecating the override imo would mean:
And then at some point in the future we just remove the override completely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, this will be a nice usability improvement!
Regarding the config change, if we want to do it might be better to do it in a separate PR to not block this one.
What this PR does: Currently, for all rate_limited errors regardless of the ingestion rate strategy, Tempo returns the local strategy limit. This causes confusion for setups that use the global strategy. This PR correctly returns the limit according to the ingestion rate strategy.
Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]