Add InstanceMultiple config to lower per instance series limit #3516

ryanhall07 · 2021-05-24T23:20:00Z

Previously the per query series limit was passed directly to every
database instance as the series limit.
For large clusters this can result in a lot of wasted series read from the
database. For example a cluster with 30 instances per replica could read 30x
the time series, just to truncate the results in the coordinators.

Theoretically if the data is perfectly sharded, the per instance limit
is equal to the query series limit / # instance per replica. To allow
for unbalanced data, the InstanceMultiple config increases this
theoretical calculation by a factor to add a buffer.

By default, this option is off, but the hope is to add a sane default
for the next major release.

Additionally if this config is set, a bug is fixed that never returned
an error if RequiredExhaustive=true and the query series limit was
exceeded by summing the individual instance results in the coordinator.
This bug, coupled with the larger per instance series limit, would
actually result in the coordinator processing far more results than
necessary and then silently truncating the results.

What this PR does / why we need it:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

Does this PR require updating code package or user-facing documentation?:

Previously the per query series limit was passed directly to every database instance as the series limit. For large clusters this can result in a lot of wasted series read from the database. For example a clsuter with 30 instances per replica could read 30x the time series, just to truncate the results in the coordinators. Theoretically if the data is perfectly sharded, the per instance limit is equal to the query series limit / # instance per replica. To allow for unbalanced data, the InstanceMultiple config increases this theoretical calculation by a factor to add a buffer. By default, this option is off, but the hope is to add a sane default for the next major release. Additionally if this config is set, a bug is fixed that never returned an error if RequiredExhaustive=true and the query series limit was exceeded by summing the individual instance results in the coordinator. This bug, coupled with the larger per instance series limt, would actually result in the coordinator processing far more results than necessary and then silently truncating the results.

ryanhall07 · 2021-05-24T23:21:53Z

src/query/storage/m3/consolidators/multi_fetch_result.go

 		}

 		if shouldFilter {
 			// NB: skip here, the closer will free the series iterator regardless.
 			continue
 		}

-		if err := r.dedupeMap.add(iter, attrs); err != nil {
+		if r.dedupeMap.len() == r.limitOpts.Limit {
+			updated, err := r.dedupeMap.update(iter, attrs)


this was rather tricky. needed to handle the case of a "better result" from a namespace overwriting an existing result.

ryanhall07 · 2021-05-24T23:24:01Z

src/query/storage/m3/consolidators/multi_fetch_result.go

@@ -190,11 +202,22 @@ func (r *multiResult) Add(
 		return
 	}

+	// the series limit was reached within this namespace.
+	if !metadata.Exhaustive && r.limitOpts.RequireExhaustive {


this is the case where query series limit was exceeded by a single namespace (i.e summing up the results from all the instances in a replica)

ryanhall07 · 2021-05-24T23:24:26Z

src/query/storage/m3/consolidators/multi_fetch_result.go

 	}

-	// Now de-duplicate
-	r.addOrUpdateDedupeMap(attrs, newIterators)
+	// the series limit was reached by adding the results of this namespace to the existing results.


this is the case where the query series limit is exhausted across multiple namespaces.

unfortunately since results from one namespace can overwrite results from another namespace, we have to iterate through the results to see if the limit was exceeded. ie. can't rely on metadata.FetchSeriesCount

added can be false in the error case, is that alright?

good catch. probably need to guard this with r.err == nil as well. not a big fan of apis that don't return errors

wesleyk

LG, should we update the per query limit integration tests?

wesleyk · 2021-05-24T23:51:36Z

src/cmd/services/m3query/config/config.go

+	// If set to 0 the feature is disabled and the MaxFetchedSeries is used as the limit for database instance.
+	// For large clusters, enabling this feature can dramatically decrease the amount of wasted series read from a
+	// single database instance.
+	InstanceMultiple float32 `yaml:"instanceMultiple"`


should we prohibit a value between 0 and 1?

wesleyk · 2021-05-24T23:56:35Z

src/query/storage/m3/consolidators/multi_fetch_result.go

 	}

-	// Now de-duplicate
-	r.addOrUpdateDedupeMap(attrs, newIterators)
+	// the series limit was reached by adding the results of this namespace to the existing results.


added can be false in the error case, is that alright?

wesleyk · 2021-05-25T00:00:32Z

src/query/storage/types.go

@@ -114,6 +114,8 @@ type FetchOptions struct {
 	Remote bool
 	// SeriesLimit is the maximum number of series to return.
 	SeriesLimit int
+	// InstanceMultiple is how much to increase the per database instance series limit.src/dbnode/client/session.go


rm src/dbnode/client/session.go?

wesleyk · 2021-05-25T00:00:50Z

src/query/storage/m3/storage.go

+		// Piggy back on the new InstanceMultiple option to enable checking require exhaustive. This preserves the
+		// existing buggy behavior of the coordinators not requiring exhaustive. Once InstanceMultiple is enabled by
+		// default, this can be removed.
+		RequireExhaustive: queryOptions.InstanceMultiple > 0 && options.RequireExhaustive,


not really following why this will be set to false if queryOptions.InstanceMultiple == 0

to preserve backwards compatibility with the buggy logic to not check RequireExhaustive in the coord. I'm piggy backing on the rollout of InstanceMultiple to also enable checking RequireExhaustive in the coord. Will remove queryOptions.InstanceMultiple > 0 once we verify this all works.

ryanhall07 · 2021-05-25T02:06:04Z

LG, should we update the per query limit integration tests?

I'll do that a in followup. I did verify this works as expected.

codecov · 2021-05-25T02:06:56Z

Codecov Report

Merging #3516 (dfc3af4) into master (4477597) will decrease coverage by 0.2%.
The diff coverage is 0.0%.

@@           Coverage Diff            @@
##           master   #3516     +/-   ##
========================================
- Coverage    56.1%   55.9%   -0.3%     
========================================
  Files         548     548             
  Lines       61372   61378      +6     
========================================
- Hits        34483   34352    -131     
- Misses      23802   23923    +121     
- Partials     3087    3103     +16

Flag	Coverage Δ
aggregator	`57.4% <ø> (ø)`
cluster	`∅ <ø> (∅)`
collector	`54.3% <ø> (ø)`
dbnode	`60.3% <0.0%> (-0.4%)`	⬇️
m3em	`46.4% <ø> (ø)`
metrics	`19.8% <ø> (ø)`
msg	`74.4% <ø> (-0.3%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4477597...dfc3af4. Read the comment docs.

ryanhall07 requested review from schallert, robskillington, nbroyles and wesleyk and removed request for schallert May 24, 2021 23:20

ryanhall07 commented May 24, 2021

View reviewed changes

ryanhall07 added 2 commits May 24, 2021 16:27

Merge branch 'master' into rhall-require-exhaustive

57127b8

typo

33937d3

wesleyk approved these changes May 25, 2021

View reviewed changes

review comments

dfc3af4

ryanhall07 merged commit 00110f3 into master May 25, 2021

ryanhall07 deleted the rhall-require-exhaustive branch May 25, 2021 02:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add InstanceMultiple config to lower per instance series limit #3516

Add InstanceMultiple config to lower per instance series limit #3516

ryanhall07 commented May 24, 2021

ryanhall07 May 24, 2021

ryanhall07 May 24, 2021

ryanhall07 May 24, 2021

ryanhall07 May 24, 2021

wesleyk May 24, 2021

ryanhall07 May 25, 2021

wesleyk left a comment

wesleyk May 24, 2021

wesleyk May 24, 2021

wesleyk May 25, 2021

ryanhall07 May 25, 2021

wesleyk May 25, 2021

ryanhall07 May 25, 2021

ryanhall07 commented May 25, 2021

codecov bot commented May 25, 2021 •

edited

Loading

Add InstanceMultiple config to lower per instance series limit #3516

Add InstanceMultiple config to lower per instance series limit #3516

Conversation

ryanhall07 commented May 24, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wesleyk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryanhall07 commented May 25, 2021

codecov bot commented May 25, 2021 • edited Loading

Codecov Report

codecov bot commented May 25, 2021 •

edited

Loading