Benchmark concurrent Cassandra LWTs #6186
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Cassandra LWT overview
Cadence uses Cassandra's LWT (aka Compare and Swap) queries to ensure consistent atomic updates for various scenarios. LWTs are implemented using Paxos based coordination between nodes. LWTs are scoped to the partition key. Running multiple concurrent LWTs for the same partition can impact each other due to the coordination/conflict resolution mechanisms inherent in the Paxos algorithm and its adaptations for LWTs.
When multiple coordinators attempt to perform writes to the same partition simultaneously, they may clash, leading to contention.
Cassandra employs randomized exponential backoff to mitigate this, meaning that conflicting LWTs will experience delays as they retry their operations with new ballot numbers.
Each LWT must reach a quorum of nodes to proceed with its stages (prepare, propose, commit). If multiple LWTs are running concurrently, they may compete for the same set of nodes to form a quorum, potentially causing delays or retries.
The need to re-propose values in case of in-progress Paxos sessions can delay LWTs. If an LWT detects an uncommitted value, it must complete or supersede it before proceeding, adding latency.
Cassandra ensures that once a value is decided (accepted by a quorum), no earlier proposal can be reproposed. Concurrent LWTs must navigate this rule, meaning that operations started later might see the effects of earlier ones, ensuring linearizability but potentially causing additional rounds of coordination.
This PR benchmarks the impact of concurrency by generating 1k
UpdateWorkflow
LWT queries on same partition with different concurrency limits. The goal is to measure whether tuning LWT concurrency is worth it and potentially help prevent hot partition problems that we face in some environments under high load.Benchmark setup
Operation timed out - received only 0 responses
.Benchmark with single node Cassandra
Benchmark with 2 nodes Cassandra and replication_factor: 2
Notes on results
What is next?
UpdateWorkflow
queries for a given partition are originated from single Cadence History service host (corresponding shard owner history engine instance). This means we can easily control the concurrency by introducing a similar implementation. A number between (1, 10) seem to promise best latency and throughput. In order to decide exact concurrency limit we can try two approaches:- Benchmark this in a prod-like environment and come up with a good enough number that will handle the volume of queries with minimal contention.
- Introduce an adaptive concurrency limiter that dynamically increases/decreases concurrency based on ratio of timeouts.