-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: lease maintenance scheduler #98433
Comments
Once #100430 lands, we'll be eagerly extending expiration-based leases during Raft ticks. That solves one of the objectives here. However, the replicate queue is still responsible for acquiring leases for ranges that don't have one, and for switching the lease type (e.g. when We should move this eager lease acquisition and type switching into the Raft tick loop as well, so they're more responsive. However, this requires the ranges to be unquiesced. It's possible that we'll want to only do this for expiration leases, and not for epoch leases, to maintain the same performance profile. We also need to make sure we don't slam the cluster with a ton of concurrent lease requests, but instead stagger them. Maybe #100426 is the solution here, or maybe we need some other policy. We also need to try to honor lease preferences. |
100430: kvserver: eagerly extend expiration leases in Raft scheduler r=erikgrinaker a=erikgrinaker This is submitted with a 23.1 backport in mind, and is therefore limited in scope. More work is needed to fully address #98433 in the 23.2 timeframe. --- **kvserver: don't quiesce ranges with expiration-based leases** This patch prevents quiescence with expiration-based leases, since we'll have to propose a lease extension shortly anyway. Out of caution, we only do this when `kv.expiration_leases_only.enabled` is `true`, with an eye towards 23.1 backports. For 23.2, we will always do this. Touches #94592. Epic: none Release note: None **kvserver: eagerly extend expiration leases in Raft scheduler** This patch eagerly extends expiration leases in the Raft scheduler, during Raft ticks, but only if `kv.expiration_leases_only.enabled` is `true`. This is possible because we no longer allow ranges with expiration leases to quiesce in this case. The old store lease renewer is disabled in this case, but still tracks local system ranges with expiration leases in case the setting is later changed. The store lease renewer is still retained for use by default, to allow backporting this change to 23.1 while keeping the existing default behavior. Doing this during Raft scheduling is compelling, since it already does efficient goroutine scheduling (including liveness range prioritization) and holds the replica mutex. Compared to an alternative implementation that relied on a separate, improved store lease renewer which also allowed ranges to quiesce between extensions, this has significantly better performance. The main reason for this is that unquiescing involves a Raft append to wake the leader. On an idle cluster with 20.000 ranges using only expiration leases, CPU usage was 26% vs. 30%, write IOPS was 500 vs. 900, and network traffic was 0.8 MB/s vs. 1.0 MB/s. KV benchmark results: ``` name old ops/sec new ops/sec delta kv0/enc=false/nodes=3 14.6k ± 3% 15.2k ± 4% +3.72% (p=0.003 n=8+8) kv95/enc=false/nodes=3 30.6k ± 3% 32.2k ± 3% +5.22% (p=0.000 n=8+8) name old p50 new p50 delta kv0/enc=false/nodes=3 11.3 ± 3% 11.0 ± 0% -2.76% (p=0.006 n=8+6) kv95/enc=false/nodes=3 4.59 ± 8% 4.55 ± 3% ~ (p=0.315 n=8+8) name old p95 new p95 delta kv0/enc=false/nodes=3 31.5 ± 0% 30.2 ± 4% -4.25% (p=0.003 n=6+8) kv95/enc=false/nodes=3 19.5 ± 7% 18.1 ± 4% -7.28% (p=0.000 n=8+7) name old p99 new p99 delta kv0/enc=false/nodes=3 48.7 ± 3% 47.4 ± 6% ~ (p=0.059 n=8+8) kv95/enc=false/nodes=3 32.8 ± 9% 30.2 ± 4% -8.04% (p=0.001 n=8+8) ``` Resolves #99812. Touches #98433. Epic: none Release note: None **kvserver: also acquire expiration leases in replicate queue** Previously, the replicate queue would only reacquire epoch leases, and allow expiration leases to expire. It now reacquires all leases, since they are eagerly extended. In the future, when quiescence is removed entirely, this responsibility should be moved to the Raft tick loop. Epic: none Release note: None **kvserver: conditionally extend expiration leases during requests** If `kv.expiration_leases_only.enabled´ is `true`, this patch disables expiration lease extension during request processing, since the Raft scheduler will eagerly extend leases in this case. This will eventually be removed completely, but is kept for now with an eye towards a 23.1 backport. Epic: none Release note: None Co-authored-by: Erik Grinaker <[email protected]>
100434: kvserver: grow goroutine stack for `requestLease` r=erikgrinaker a=erikgrinaker We pre-grow the goroutine stack when processing batch requests via RPC, to avoid frequent stack copying. However, this did not apply to lease requests, which are submitted directly to the local replica. When eagerly extending 20.000 expiration leases, `requestLease` stack growth made up 4% of total CPU usage -- this patch reduces that to 1%. Touches #98433. Epic: none Release note: None Co-authored-by: Erik Grinaker <[email protected]>
101050: kvserver: move eager expiration lease extension to Raft scheduler r=erikgrinaker a=erikgrinaker **kvserver: never quiesce expiration leases** Previously, this was contingent on `kv.expiration_leases_only.enabled`. This patch therefore, in practice, disables quiescence for the meta/liveness ranges and for any ranges that are transferring an expiration lease before upgrading it to an epoch lease. Quiescing these will tend to result in more work rather than less, because we'll have to wake the ranges up again shortly which incurs an additional Raft append. Epic: none Release note: None **kvserver: move eager expiration lease extension to Raft scheduler** This patch makes the Raft scheduler responsible for eagerly extending all expiration-based leases, replacing the store lease renewer. Previously, this was contingent on `kv.expiration_leases_only.enabled`, and by default only the meta/liveness ranges were eagerly extended by the store lease renewer. Touches #98433. Epic: none Release note: None **kvserver: don't extend leases during request processing** This patch removes expiration lease extension during request processing, since the Raft scheduler now always eagerly extends expiration leases. Previously, this was contingent on `kv.expiration_leases_only.enabled`. Epic: none Release note: None Co-authored-by: Erik Grinaker <[email protected]>
100432: kvserver: don't add range info for lease requests r=erikgrinaker a=erikgrinaker Lease requests are submitted directly to the local replica, bypassing the DistSender. They therefore don't set `ClientRangeInfo`, and don't need range info returned. However, `maybeAddRangeInfoToResponse()` would interpret this as stale range info and always populate the response with range info, which can be expensive -- especially with expiration leases that are frequently extended. In an idle cluster with 20.000 eagerly extended expiration leases, this was responsible for 1.4% of total CPU usage. Touches #98433. Epic: none Release note: None 102416: githubpost: add team label to issues r=erikgrinaker a=erikgrinaker **TEAMS: add `T-kv` label for KV** Epic: none Release note: None **githubpost: add team label to issues** This patch automatically adds team labels to created issues. Previously, this was done by Blathers based on the project board, but not all teams use project boards -- and we may as well just set the correct label to begin with. Epic: none Release note: None Co-authored-by: Erik Grinaker <[email protected]>
100433: batcheval: don't declare latches for `RequestLease` r=erikgrinaker a=erikgrinaker `RequestLease` does not take out latches. However, it still declared latches to appease the spanset asserter. This has a non-negligible cost with eagerly extended expiration leases, about 1.4% of CPU usage with 20.000 idle ranges. This patch removes latch declaration and disables spanset assertions. Touches #98433. Epic: none Release note: None Co-authored-by: Erik Grinaker <[email protected]>
100435: kvserver: avoid unnecessary tracing span in `requestLeaseAsync()` r=erikgrinaker a=erikgrinaker There's no need to create a tracing span when the client isn't running a trace. Touches #98433. Epic: none Release note: None Co-authored-by: Erik Grinaker <[email protected]>
We need a scheduler that eagerly maintains range leases. This should:
There are two motivations for this
We don't want lease acquisition to be lazily driven by client requests.
When only using expiration leases, we want to ensure ranges have a lease even when there is no traffic on the ranges, to avoid lease acquisition latencies on the next request to the range.
A few important aspects to consider:
We have a few similar mechanisms already, that should be mostly be replaced by this scheduler:
The replicate queue acquires leases for ranges that doesn't have one.
cockroach/pkg/kv/kvserver/replicate_queue.go
Lines 989 to 996 in 736a67e
Store.startLeaseRenewer()
eagerly renews the expiration leases on the meta and liveness ranges to avoid high tail latencies.cockroach/pkg/kv/kvserver/store.go
Lines 2089 to 2098 in 8add7c5
Replica.maybeExtendLeaseAsyncLocked()
will extend an expiration lease when processing a request in the last half of the lease interval.cockroach/pkg/kv/kvserver/replica_range_lease.go
Lines 1437 to 1454 in 736a67e
Jira issue: CRDB-25265
Epic CRDB-25207
The text was updated successfully, but these errors were encountered: