-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add InstanceMultiple config to lower per instance series limit #3516
Conversation
Previously the per query series limit was passed directly to every database instance as the series limit. For large clusters this can result in a lot of wasted series read from the database. For example a clsuter with 30 instances per replica could read 30x the time series, just to truncate the results in the coordinators. Theoretically if the data is perfectly sharded, the per instance limit is equal to the query series limit / # instance per replica. To allow for unbalanced data, the InstanceMultiple config increases this theoretical calculation by a factor to add a buffer. By default, this option is off, but the hope is to add a sane default for the next major release. Additionally if this config is set, a bug is fixed that never returned an error if RequiredExhaustive=true and the query series limit was exceeded by summing the individual instance results in the coordinator. This bug, coupled with the larger per instance series limt, would actually result in the coordinator processing far more results than necessary and then silently truncating the results.
} | ||
|
||
if shouldFilter { | ||
// NB: skip here, the closer will free the series iterator regardless. | ||
continue | ||
} | ||
|
||
if err := r.dedupeMap.add(iter, attrs); err != nil { | ||
if r.dedupeMap.len() == r.limitOpts.Limit { | ||
updated, err := r.dedupeMap.update(iter, attrs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was rather tricky. needed to handle the case of a "better result" from a namespace overwriting an existing result.
@@ -190,11 +202,22 @@ func (r *multiResult) Add( | |||
return | |||
} | |||
|
|||
// the series limit was reached within this namespace. | |||
if !metadata.Exhaustive && r.limitOpts.RequireExhaustive { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the case where query series limit was exceeded by a single namespace (i.e summing up the results from all the instances in a replica)
} | ||
|
||
// Now de-duplicate | ||
r.addOrUpdateDedupeMap(attrs, newIterators) | ||
// the series limit was reached by adding the results of this namespace to the existing results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the case where the query series limit is exhausted across multiple namespaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately since results from one namespace can overwrite results from another namespace, we have to iterate through the results to see if the limit was exceeded. ie. can't rely on metadata.FetchSeriesCount
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added can be false in the error case, is that alright?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. probably need to guard this with r.err == nil
as well. not a big fan of apis that don't return errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG, should we update the per query limit integration tests?
// If set to 0 the feature is disabled and the MaxFetchedSeries is used as the limit for database instance. | ||
// For large clusters, enabling this feature can dramatically decrease the amount of wasted series read from a | ||
// single database instance. | ||
InstanceMultiple float32 `yaml:"instanceMultiple"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we prohibit a value between 0 and 1?
} | ||
|
||
// Now de-duplicate | ||
r.addOrUpdateDedupeMap(attrs, newIterators) | ||
// the series limit was reached by adding the results of this namespace to the existing results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added can be false in the error case, is that alright?
src/query/storage/types.go
Outdated
@@ -114,6 +114,8 @@ type FetchOptions struct { | |||
Remote bool | |||
// SeriesLimit is the maximum number of series to return. | |||
SeriesLimit int | |||
// InstanceMultiple is how much to increase the per database instance series limit.src/dbnode/client/session.go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm src/dbnode/client/session.go
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird lol
// Piggy back on the new InstanceMultiple option to enable checking require exhaustive. This preserves the | ||
// existing buggy behavior of the coordinators not requiring exhaustive. Once InstanceMultiple is enabled by | ||
// default, this can be removed. | ||
RequireExhaustive: queryOptions.InstanceMultiple > 0 && options.RequireExhaustive, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really following why this will be set to false if queryOptions.InstanceMultiple == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to preserve backwards compatibility with the buggy logic to not check RequireExhaustive
in the coord. I'm piggy backing on the rollout of InstanceMultiple
to also enable checking RequireExhaustive
in the coord. Will remove queryOptions.InstanceMultiple > 0
once we verify this all works.
I'll do that a in followup. I did verify this works as expected. |
Codecov Report
@@ Coverage Diff @@
## master #3516 +/- ##
========================================
- Coverage 56.1% 55.9% -0.3%
========================================
Files 548 548
Lines 61372 61378 +6
========================================
- Hits 34483 34352 -131
- Misses 23802 23923 +121
- Partials 3087 3103 +16
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
Previously the per query series limit was passed directly to every
database instance as the series limit.
For large clusters this can result in a lot of wasted series read from the
database. For example a cluster with 30 instances per replica could read 30x
the time series, just to truncate the results in the coordinators.
Theoretically if the data is perfectly sharded, the per instance limit
is equal to the query series limit / # instance per replica. To allow
for unbalanced data, the InstanceMultiple config increases this
theoretical calculation by a factor to add a buffer.
By default, this option is off, but the hope is to add a sane default
for the next major release.
Additionally if this config is set, a bug is fixed that never returned
an error if RequiredExhaustive=true and the query series limit was
exceeded by summing the individual instance results in the coordinator.
This bug, coupled with the larger per instance series limit, would
actually result in the coordinator processing far more results than
necessary and then silently truncating the results.
What this PR does / why we need it:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing and/or backwards incompatible change?:
Does this PR require updating code package or user-facing documentation?: