roachtest: 2023/04/30 kv0/nodes=3/cpu=96 regression [optimizer_use_histograms in internal executor] #102954

kvoli · 2023-05-09T14:46:13Z

kv0/nodes=3/cpu=96 had a serious (66%) regression in terms of ops/s since the Apr 30th run. The regression occurs on both GCE and AWS.

Jira issue: CRDB-27754

The text was updated successfully, but these errors were encountered:

blathers-crl · 2023-05-09T14:46:15Z

Hi @kvoli, please add a C-ategory label to your issue. Check out the label system docs.

While you're here, please consider adding an A- label to help keep our repository tidy.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

blathers-crl · 2023-05-09T14:46:15Z

cc @cockroachdb/test-eng

smg260 · 2023-05-09T19:15:33Z

Bisection results, PR #101486 (cc: @rafiss )

2023-05-09T16:46:45Z   Bisecting regression in [kv0/enc=false/nodes=3/cpu=96] using commit range [4619767bb0b (known good),7d02e061759 (known bad)]
2023-05-09T16:46:45Z   Thresholds [good >= 43000, bad <= 27000]
2023-05-09T17:04:12Z   [1ee30a1fe7e] median ops/s: [26590]. Auto marked as bad.
2023-05-09T17:33:17Z   [2f03212bd1f] median ops/s: [24438]. Auto marked as bad.
2023-05-09T18:02:09Z   [608f03a999a] median ops/s: [23428]. Auto marked as bad.
2023-05-09T18:32:17Z   [da48b19bcc2] median ops/s: [25457]. Auto marked as bad.
2023-05-09T18:32:17Z   Bisection complete. Suspect commit:
commit da48b19bcc2f962dda531d075502682b64326bf4
Author: Rafi Shamim <[email protected]>
Date:   Thu Apr 13 15:12:45 2023 -0400

    sql: use global defaults for internal session data rather than zero-values

    Release note: None

rafiss · 2023-05-09T21:50:47Z

Thanks for tracking that down. I was wondering if my PR would turn anything up. If you didn't get a chance to look at it, the summary is that we previously were using zero-valued defaults for all session variables used in internal operations (jobs, leases, delegates queries for some builtin functions, etc). After #101486, the default values for session variables will be used instead.

One notable change is that previously, internal operations would use distsql=off but now they will use distsql=auto. Another one is distsql_workmem, which previously was unbounded, but now uses the default limit of 64 MiB.

To track this down further, I'd be curious to see if there are any specific queries that are slower. Then we can look at their traces.

smg260 · 2023-05-22T20:37:58Z

@rafiss Thanks for the change overview. I might be of limited help here, but can you link me any documentation on obtaining query stats for this particular workload during a roachtest run?

irfansharif · 2023-05-30T00:09:46Z

This regression also appears for kv0/enc=false/nodes=3/cpu=32: https://roachperf.crdb.dev/?filter=&view=kv0%2Fenc%3Dfalse%2Fnodes%3D3%2Fcpu%3D32&tab=aws.

irfansharif · 2023-06-02T19:31:21Z

kv0-traces.zip

stmt-bundle-870525437835771905/ - slow trace
stmt-bundle-870526672146366465/ - fast trace

irfansharif · 2023-06-02T21:06:18Z

Partway through a full kv0 run, there's a step change in latency distributions. It becomes bi-modal and then throughput suffers:

irfansharif · 2023-06-04T16:21:38Z

Another one is distsql_workmem, which previously was unbounded, but now uses the default limit of 64 MiB.

It wasn't previously unbounded, I don't think:

cockroach/pkg/sql/execinfra/server_config.go

Lines 340 to 343 in 84ed8ac

    
           if flowCtx.EvalCtx.SessionData().WorkMemLimit <= 0 { 
        
           	// If for some reason workmem limit is not set, use the default value. 
        
           	return DefaultMemoryLimit 
        
           }

irfansharif · 2023-06-04T16:27:23Z

Here's another slow trace that doesn't straddle multiple nodes. But it's still the higher KV exec latencies (around proposal replication/acknowledgement) that's affecting top-line throughput.

Here's a fast trace that straddles two nodes (captured in the first 5 minutes of kv0/nodes=3/cpu=96 when throughput is still high + latencies low) that also has low replication latency. Why the step change few minutes in?

irfansharif · 2023-06-04T16:35:58Z

I also kicked off two manual runs using 1ee30a1 (which includes the merge commit from #101486) and 2c9bdbb (the merge commit from before). The bisection is indeed correct. Though I don't understand why it's caused the regression we're seeing.

irfansharif · 2023-06-04T18:23:04Z

Two runs of kv0/nodes=3/cpu=96 from a recent master and then a master with all commits from #101486 reverted. Just to sanity check that it's that PR that's done something somehow.

irfansharif · 2023-06-04T18:47:53Z

Of the commits in that PR, it's da48b19 in particular where the regression appears. Confirmed this by isolating the SHA (reverted all else from master and re-running kv0 experiments while still observing degradation).

irfansharif · 2023-06-04T19:59:06Z

I looked through debug zips, looking for clues in logs around when one of these step changes happened. I saw things like:

I230604 16:01:34.939333 5256528 sql/stats/automatic_stats.go:833 ⋮ [T1,n3] 424 automatically executing ‹"CREATE STATISTICS __auto__ FROM [106] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s'"›
I230604 16:02:34.945126 5721242 sql/stats/automatic_stats.go:833 ⋮ [T1,n3] 425 automatically executing ‹"CREATE STATISTICS __auto__ FROM [106] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s'"›
I230604 16:03:34.952456 6151338 sql/stats/automatic_stats.go:833 ⋮ [T1,n3] 426 automatically executing ‹"CREATE STATISTICS __auto__ FROM [106] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s'"›
I230604 16:04:34.960182 6524605 sql/stats/automatic_stats.go:833 ⋮ [T1,n3] 427 automatically executing ‹"CREATE STATISTICS __auto__ FROM [106] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s'"›
I230604 16:05:27.405226 96 util/admission/work_queue.go:512 ⋮ [T1,n3] 428 ‹sql-kv-response›: FIFO threshold for tenant 1 is -128
I230604 16:05:27.505068 94 util/admission/work_queue.go:512 ⋮ [T1,n3] 429 ‹kv›: FIFO threshold for tenant 1 is -128
I230604 16:05:27.605032 395 util/admission/work_queue.go:512 ⋮ [T1,n3] 430 ‹kv›: FIFO threshold for tenant 1 is -128
I230604 16:05:35.038644 6855831 sql/stats/automatic_stats.go:833 ⋮ [T1,n3] 431 automatically executing ‹"CREATE STATISTICS __auto__ FROM [106] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s'"›
I230604 16:06:35.044587 7161519 sql/stats/automatic_stats.go:833 ⋮ [T1,n3] 432 automatically executing ‹"CREATE STATISTICS __auto__ FROM [106] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s'"›
I230604 16:07:35.051411 7464761 sql/stats/automatic_stats.go:833 ⋮ [T1,n3] 433 automatically executing ‹"CREATE STATISTICS __auto__ FROM [106] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s'"›
I230604 16:08:35.058832 7760751 sql/stats/automatic_stats.go:833 ⋮ [T1,n3] 434 automatically executing ‹"CREATE STATISTICS __auto__ FROM [106] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s'"›
I230604 16:08:35.067411 7910897 jobs/registry.go:617 ⋮ [T1,n3,intExec=‹create-stats›] 435 active version is 23.1-6

The 106 table ID corresponds to kv.kv that this workload is upserting into. I next tried disabling the auto stats collection mechanism:

SET CLUSTER SETTING sql.stats.automatic_collection.enabled = false;

It's made all the difference. Here are two runs of kv0/nodes=3/cpu=96 with and without automatics SQL stats collection.

I did the same at 1ee30a1 but running kv0/enc=false/nodes=3/cpu=32. Here's with and without automatic SQL stats collection respectively.

(All the pink grafana annotations correspond to jobs running, including the auto stats one.)

irfansharif · 2023-06-04T20:20:22Z

stmt-bundle-871053367241342979.zip is the statement bundle for the trace in #102954 (comment). I can look further (I'm generally interested in understanding how the PR above + automatic table statistics could cause an increase in what looks like KV replication latencies), but I'd like some pointers from @cockroachdb/sql-queries first. It's a simple UPSERT INTO kv (k, v) VALUES ($1:::INT8, $2:::BYTES) into the following schema:

CREATE TABLE public.kv (
	k INT8 NOT NULL,
	v BYTES NOT NULL,
	CONSTRAINT kv_pkey PRIMARY KEY (k ASC)
);

michae2 · 2023-09-19T04:01:04Z

I see, interesting. Thank you for digging into this so deeply. I can see why this has proved so elusive.

erikgrinaker · 2023-09-19T09:40:24Z

doubling the the COCKROACH_SCHEDULER_SHARD_SIZE to 32

I wonder if this is because it increases Raft scheduler mutex contention, which alleviates pressure on some pathologically contended resource elsewhere in the system. Did Raft scheduler latency increase or decrease with this change?

srosenberg · 2023-09-19T13:42:25Z

This is quite a puzzle! Admission control kicks in at 14:45 (see KV Admission Slots); shortly thereafter, concurrency (as measured by runnable goroutines) drops.

Any insights from the goroutine profiles? Also, was the workload running on n4? I couldn't find any system metrics (node_exporter) for it or the cluster nodes. We should enable node_exporter if only to rule out anything lower in the stack.

cucaroach · 2023-09-19T14:47:30Z

I wonder if this is because it increases Raft scheduler mutex contention, which alleviates pressure on some pathologically contended resource elsewhere in the system. Did Raft scheduler latency increase or decrease with this change?

Raft scheduler latency is high when performance is degraded and low when it isn't. With histograms disabled P99 is like 25ms, when they are enabled its like >300ms. COCKROACH_SCHEDULER_SHARD_SIZE doesn't seem to change that.

Here's the timebounds for the 32 shard experiment I did this morning:

http://treilly-32-0001.roachprod.crdb.io:26258/#/metrics/overview/cluster?start=1695132979&end=1695134779

DrewKimball · 2023-09-19T19:04:10Z

@cucaroach If we want to make sure that this regression is due to plan changes and not something more exotic like mutex contention, maybe we could try running a build that reads/builds the histogram, but otherwise ignores it when calculating stats.

mgartner · 2023-09-19T19:19:46Z

@cucaroach If we want to make sure that this regression is due to plan changes and not something more exotic like mutex contention, maybe we could try running a build that reads/builds the histogram, but otherwise ignores it when calculating stats.

My understanding is that this is exactly what optimizer_use_histograms is controlling. When it is off, we are still collecting histograms for system tables, but we're not considering those histograms when calculating memo statistics.

It's not immediately clear looking at where that setting is queries to know if it would prove/disprove query plan changes or some exotic mutex contention as the culprit here though.

cucaroach · 2023-09-19T19:53:00Z

I don't think mutex contention is the issue but I could be wrong. I see roughly the same amount of mutex contention in the good and bad cases but it is squarely all centered around RWMutex's and lots related to the ReplicaMutex. Here's the good case:

Here's the bad case:

I feel like with 32 CPU's and 64 client sessions 3-4s of contention isn't terrible, these are 3s mutex profile deltas. Certainly an experiment with DRWMutex is warranted but I'm focused on examining plans right now, the changing plans must change the KV request demographics somehow. As an experiment to guage whether this amount of contention was problematic I completely commented out the statsCollector Start/EndTransaction calls and it made no difference.

mgartner · 2023-09-19T22:03:52Z

One thing that's suspicious in the trace from the slow statement bundle here is that there's a significant amount of time spent between steps that usually take <1ms from what I've seen in common statement bundle (maybe I'm just not very worldly w.r.t. traces?):

Is the ~30ms spent here a clue?

There's still a big unexplained gap in the trace—maybe it makes sense to insert addition tracing logs and try to pin down where the majority of that time is spent?

srosenberg · 2023-09-20T00:33:44Z

Is the ~30ms spent here a clue?

I'd say the first ~7ms is suspect. It suggests the request is being held in some queue(s).

DrewKimball · 2023-09-20T01:29:53Z

I'd say the first ~7ms is suspect. It suggests the request is being held in some queue(s).

Admission control is the obvious suspect there. It happens right after the first line.

cucaroach · 2023-09-21T14:06:53Z

I tried turning off admission control b/c I suspected it was letting in too much work and when in degraded mode if I disable admission control it seems to get worse.

One weird thing about this is that if you turn back on the histogram disabling for internal queries performance doesn't return to normal, we're stuck in degraded mode. But I discovered if I call reset_sql_stats and reset_activity_tables the throughput collapse goes away. So there's some bad interaction between how those tables work and histograms. TRUNCATE on the statement_statistics and transaction_statistics tables does nothing so it must be that the in memory cache for these things is the issue (the reset functions also reset the in memory).

So the next thing I tried was removing this lock: https://github.com/cockroachdb/cockroach/blob/release-23.1/pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go#L359

Not postive that's completely sound but it seems to remove over half the throughput collapse, we bottom out around 30K instead of 20k, progress!

Making all those heap allocations holding a lock is silly but I need to draw a line between the histogram setting change and the cache performance, I suspect the cardinality of the maps goes way up. Interestingly the raft replication P99 is cut in half with this change, I still have no idea how that's connected to these front end happenings.

cucaroach · 2023-09-21T17:20:32Z

Not postive that's completely sound but it seems to remove over half the throughput collapse, we bottom out around 30K instead of 20k, progress!

This result didn't replicate after 3 runs, only the first looked like this, the last two fell into the <20k QPS hole.

srosenberg · 2023-09-22T04:09:19Z

It's probably been considered already... what about just logging EXPLAIN (OPT) for every issued query in both runs? We'd expect no material difference, but what if it is.

cucaroach · 2023-09-27T18:49:27Z

By saving off the statement_statistics between runs and doing some dogfooding I was able to narrow it down:

https://docs.google.com/spreadsheets/d/16M8MeaLlWr1UWyGlkn9IXa5fUC8g9Bfy9QVfsCWZ5TI

By disabling the optimizer histogram usage for just the first 10 queries I was able to get the good performance. Now I need to analyze what goes wrong with these queries/plans and narrow it down further.

michae2 · 2023-10-17T18:59:40Z

[triage] Our plan is to leave histograms disabled for internal queries in 23.2 and pick this back up during 24.1 development.

michae2 · 2024-01-25T19:43:02Z

[queries collab session] Based on Tommy's analysis, one idea is to use a table-level setting to disable histograms for system.statement_statistics, system.transaction_statistics, and system.statement_activity.

118289: sql/tests: remove stale skipped error r=yuzefovich a=yuzefovich We now have key-encoding of JSONs. Epic: None Release note: None 118330: ttl: use better defaults for some session variables r=yuzefovich a=yuzefovich This commit fixes some problems that we've seen around internally-executed queries issued by the TTL jobs, present before 23.2. In particular, on 23.1 and prior releases we used Go defaults for the session data parameters for internally-executed queries, which can differ from the defaults set by CRDB. As a result, query plans could be suboptimal as we've seen in a couple of recent escalations. This bug has been fixed on 23.2 proper (we now use CRDB defaults for all session variables except for one), and on 23.1 and prior this commit applies a targeted fix only to the TTL queries. In particular, the following overrides are now set: - `reorder_joins_limit` to the default value of 8 - `optimizer_use_histograms` and `optimizer_use_multi_col_stats` are both set to `true`. On 23.2 and later only `optimizer_use_histograms` needs to be updated since it's the only exception mentioned above. However, I chose to keep all 3 in the same commit so that it can be backported more easily to 23.1, and the following commit will revert the other 2 on 23.2 and later. Touches: #102954. Fixes: #118129. Release note (bug fix): Internal queries issued by the TTL jobs should now use optimal plans. The bug has been present since at least 22.2 version. 118479: sql: enable read committed cluster setting by default r=rafiss a=rafiss Epic CRDB-34172 Release note (sql change): The sql.txn.read_committed_isolation.enabled cluster setting is now true by default. This means that any syntax and settings that configure the READ COMMITTED isolation level will now cause the transaction to use that isolation level, rather than automatically upgrading the transaction to SERIALIZABLE. Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Rafi Shamim <[email protected]>

mgartner · 2024-08-01T16:14:27Z

The most likely explanation is that this makes some internal query plans worse, but we were unable to confirm that last time we looked at it.

This is currently preventing histograms from being used by internal executor queries. However, there's not a strong motivation to enable them—we're not aware of any query plans that would improve with the usage of histograms—so we're going to backlog this for now.

kvoli added the T-testeng TestEng Team label May 9, 2023

kvoli added the regression Regression from a release. label May 9, 2023

kvoli mentioned this issue May 9, 2023

release-23.1: split: add additional lb split tests and fix decider weight #102482

Merged

smg260 self-assigned this May 9, 2023

smg260 added the A-roachperf-investigation Issue opened as a result of roachperf triage label May 9, 2023

irfansharif assigned irfansharif and unassigned smg260 May 30, 2023

This comment was marked as outdated.

Sign in to view

irfansharif added branch-master Failures and bugs on the master branch. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jun 4, 2023

blathers-crl bot added the T-sql-queries SQL Queries Team label Jun 4, 2023

cucaroach mentioned this issue Sep 26, 2023

copy: v23.1.2-custom: didn't insert all buffered rows and yet no error was reported. Inserted 0 out of 4 rows #104662

Closed

yuzefovich unassigned cucaroach Oct 12, 2023

yuzefovich moved this from Active to Triage in SQL Queries Oct 12, 2023

michae2 moved this from Triage to 24.1 Release in SQL Queries Oct 17, 2023

mgartner removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Oct 25, 2023

mgartner moved this from 24.1 Release to 24.2 Release in SQL Queries Nov 28, 2023

yuzefovich mentioned this issue Jan 25, 2024

ttl: use better defaults for some session variables #118330

Merged

blathers-crl bot mentioned this issue Jan 30, 2024

release-23.2: ttl: use better defaults for some session variables #118494

Merged

yuzefovich mentioned this issue Jan 30, 2024

release-23.1: ttl: use better defaults for some session variables #118496

Merged

blathers-crl bot mentioned this issue Feb 1, 2024

release-23.1.15-rc: ttl: use better defaults for some session variables #118564

Merged

mgartner moved this from 24.2 Release to 24.3 Release in SQL Queries May 16, 2024

mgartner moved this from 24.3 Release to Backlog in SQL Queries Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roachtest: 2023/04/30 kv0/nodes=3/cpu=96 regression [optimizer_use_histograms in internal executor] #102954

roachtest: 2023/04/30 kv0/nodes=3/cpu=96 regression [optimizer_use_histograms in internal executor] #102954

kvoli commented May 9, 2023 •

edited by cockroach-jira-scripts

Loading

blathers-crl bot commented May 9, 2023

blathers-crl bot commented May 9, 2023

smg260 commented May 9, 2023

rafiss commented May 9, 2023

smg260 commented May 22, 2023

irfansharif commented May 30, 2023

irfansharif commented Jun 2, 2023

irfansharif commented Jun 2, 2023

This comment was marked as outdated.

irfansharif commented Jun 4, 2023

irfansharif commented Jun 4, 2023 •

edited

Loading

irfansharif commented Jun 4, 2023

irfansharif commented Jun 4, 2023

irfansharif commented Jun 4, 2023 •

edited

Loading

irfansharif commented Jun 4, 2023 •

edited

Loading

irfansharif commented Jun 4, 2023

michae2 commented Sep 19, 2023

erikgrinaker commented Sep 19, 2023

srosenberg commented Sep 19, 2023

cucaroach commented Sep 19, 2023

DrewKimball commented Sep 19, 2023

mgartner commented Sep 19, 2023

cucaroach commented Sep 19, 2023

mgartner commented Sep 19, 2023

srosenberg commented Sep 20, 2023

DrewKimball commented Sep 20, 2023

cucaroach commented Sep 21, 2023

cucaroach commented Sep 21, 2023

srosenberg commented Sep 22, 2023

cucaroach commented Sep 27, 2023

michae2 commented Oct 17, 2023

michae2 commented Jan 25, 2024

mgartner commented Aug 1, 2024

roachtest: 2023/04/30 kv0/nodes=3/cpu=96 regression [optimizer_use_histograms in internal executor] #102954

roachtest: 2023/04/30 kv0/nodes=3/cpu=96 regression [optimizer_use_histograms in internal executor] #102954

Comments

kvoli commented May 9, 2023 • edited by cockroach-jira-scripts Loading

blathers-crl bot commented May 9, 2023

blathers-crl bot commented May 9, 2023

smg260 commented May 9, 2023

rafiss commented May 9, 2023

smg260 commented May 22, 2023

irfansharif commented May 30, 2023

irfansharif commented Jun 2, 2023

irfansharif commented Jun 2, 2023

This comment was marked as outdated.

irfansharif commented Jun 4, 2023

irfansharif commented Jun 4, 2023 • edited Loading

irfansharif commented Jun 4, 2023

irfansharif commented Jun 4, 2023

irfansharif commented Jun 4, 2023 • edited Loading

irfansharif commented Jun 4, 2023 • edited Loading

irfansharif commented Jun 4, 2023

michae2 commented Sep 19, 2023

erikgrinaker commented Sep 19, 2023

srosenberg commented Sep 19, 2023

cucaroach commented Sep 19, 2023

DrewKimball commented Sep 19, 2023

mgartner commented Sep 19, 2023

cucaroach commented Sep 19, 2023

mgartner commented Sep 19, 2023

srosenberg commented Sep 20, 2023

DrewKimball commented Sep 20, 2023

cucaroach commented Sep 21, 2023

cucaroach commented Sep 21, 2023

srosenberg commented Sep 22, 2023

cucaroach commented Sep 27, 2023

michae2 commented Oct 17, 2023

michae2 commented Jan 25, 2024

mgartner commented Aug 1, 2024

kvoli commented May 9, 2023 •

edited by cockroach-jira-scripts

Loading

irfansharif commented Jun 4, 2023 •

edited

Loading

irfansharif commented Jun 4, 2023 •

edited

Loading

irfansharif commented Jun 4, 2023 •

edited

Loading