Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark concurrent Cassandra LWTs #6186

Conversation

taylanisikdemir
Copy link
Member

@taylanisikdemir taylanisikdemir commented Jul 23, 2024

Cassandra LWT overview

Cadence uses Cassandra's LWT (aka Compare and Swap) queries to ensure consistent atomic updates for various scenarios. LWTs are implemented using Paxos based coordination between nodes. LWTs are scoped to the partition key. Running multiple concurrent LWTs for the same partition can impact each other due to the coordination/conflict resolution mechanisms inherent in the Paxos algorithm and its adaptations for LWTs.

  1. Contention and Backoff:
    When multiple coordinators attempt to perform writes to the same partition simultaneously, they may clash, leading to contention.
    Cassandra employs randomized exponential backoff to mitigate this, meaning that conflicting LWTs will experience delays as they retry their operations with new ballot numbers.
  2. Quorum Requirements:
    Each LWT must reach a quorum of nodes to proceed with its stages (prepare, propose, commit). If multiple LWTs are running concurrently, they may compete for the same set of nodes to form a quorum, potentially causing delays or retries.
  3. Proposals and Commits:
    The need to re-propose values in case of in-progress Paxos sessions can delay LWTs. If an LWT detects an uncommitted value, it must complete or supersede it before proceeding, adding latency.
  4. Consistency and Order:
    Cassandra ensures that once a value is decided (accepted by a quorum), no earlier proposal can be reproposed. Concurrent LWTs must navigate this rule, meaning that operations started later might see the effects of earlier ones, ensuring linearizability but potentially causing additional rounds of coordination.

This PR benchmarks the impact of concurrency by generating 1k UpdateWorkflow LWT queries on same partition with different concurrency limits. The goal is to measure whether tuning LWT concurrency is worth it and potentially help prevent hot partition problems that we face in some environments under high load.

Benchmark setup

  • LWT queries are given 1s timeout.
  • All failures are due to timeout Operation timed out - received only 0 responses.
  • Failed queries are not retried so Elapsed time should be read as "time to attempt 1k queries with given concurrency".

Benchmark with single node Cassandra

Total Updates Concurrency Success Failed Avg Duration (ms) Max Duration (ms) Min Duration (ms) Elapsed (s)
1000 1 1000 0 7.41 26.9 4.8 7.42
1000 10 1000 0 54.54 808.2 2.8 5.16
1000 20 999 1 90.61 1075 2.8 4.63
1000 40 993 7 145.95 1071 2.2 3.78
1000 80 976 24 254.02 1093 1.4 3.36

Benchmark with 2 nodes Cassandra and replication_factor: 2

Total Updates Concurrency Success Failed Avg Duration (ms) Max Duration (ms) Min Duration (ms) Elapsed (s)
1000 1 1000 0 11.75 50.09 6.69 11.76
1000 10 900 100 370.96 1120 7.92 37.31
1000 20 642 358 602.63 1120 5.95 30.40
1000 40 365 635 826.45 1130 7.83 21.12
1000 80 64 936 1000 1120 6.87 12.93

Notes on results

  • Ratio of queries that timeout is proportional to the concurrency limit
  • The impact of high concurrency is more drastic with replication factor > 1 as expected
  • Increasing concurrency gives the impression of higher throughput but average query latency increases drastically.

What is next?

UpdateWorkflow queries for a given partition are originated from single Cadence History service host (corresponding shard owner history engine instance). This means we can easily control the concurrency by introducing a similar implementation. A number between (1, 10) seem to promise best latency and throughput. In order to decide exact concurrency limit we can try two approaches:
- Benchmark this in a prod-like environment and come up with a good enough number that will handle the volume of queries with minimal contention.
- Introduce an adaptive concurrency limiter that dynamically increases/decreases concurrency based on ratio of timeouts.

Copy link

codecov bot commented Jul 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.83%. Comparing base (736d66b) to head (a290151).
Report is 2 commits behind head on master.

Additional details and impacted files

see 11 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 736d66b...a290151. Read the comment docs.

@taylanisikdemir taylanisikdemir merged commit f9996a1 into cadence-workflow:master Jul 24, 2024
20 checks passed
@taylanisikdemir taylanisikdemir deleted the taylan/lwt_pool_benchmark branch July 24, 2024 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants