kv: set sane default for kv.transaction.write_pipelining_max_batch_size #32606

nvanbenschoten · 2018-11-26T19:53:44Z

Informs #32522.

There is a tradeoff here between the overhead of waiting for consensus for a batch if we don't pipeline and proving that all of the writes in the batch succeed if we do pipeline. We set this default to a value which experimentally strikes a balance between the two costs.

To determine the best value for this setting, I ran a three-node single-AZ AWS cluster with 4 vCPU nodes (m5d.xlarge). I modified KV to perform writes in an explicit txn and to run multiple statements. I then ran kv0 with 8 DML statements per txn (a reasonable estimate for the average number of statements that an explicit txn runs) and adjusted the batch size of these statements from 1 to 256. This resulted in the following graph:

We can see that the cross-over point where txn pipelining stops being beneficial is with batch sizes somewhere between 128 and 256 rows. Given this information, I set the default for
kv.transaction.write_pipelining_max_batch_size` to 128.

Of course, there are a lot of variables at play here: storage throughput, replication latency, node size, etc. I think the setup I used hits a reasonable middle ground with these.

Release note: None

cockroach-teamcity · 2018-11-26T19:53:50Z

This change is

tbg

Reviewed 1 of 1 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained

Informs cockroachdb#32522. There is a tradeoff here between the overhead of waiting for consensus for a batch if we don't pipeline and proving that all of the writes in the batch succeed if we do pipeline. We set this default to a value which experimentally strikes a balance between the two costs. To determine the best value for this setting, I ran a three-node single-AZ AWS cluster with 4 vCPU nodes (`m5d.xlarge`). I modified KV to perform writes in an explicit txn and to run multiple statements. I then ran `kv0` with 8 DML statements per txn (a reasonable estimate for the average number of statements that an **explicit** txn runs) and adjusted the batch size of these statements from 1 to 256. This resulted in the following graph: <see graph in PR> We can see that the cross-over point where txn pipelining stops being beneficial is with batch sizes somewhere between 128 and 256 rows. Given this information, I set the default for `kv.transaction.write_pipelining_max_batch_size` to 128. Of course, there are a lot of variables at play here: storage throughput, replication latency, node size, etc. I think the setup I used hits a reasonable middle ground with these. Release note: None

nvanbenschoten · 2018-11-26T22:47:08Z

bors r+

Planning on backporting to 2.1.

32606: kv: set sane default for kv.transaction.write_pipelining_max_batch_size r=nvanbenschoten a=nvanbenschoten Informs #32522. There is a tradeoff here between the overhead of waiting for consensus for a batch if we don't pipeline and proving that all of the writes in the batch succeed if we do pipeline. We set this default to a value which experimentally strikes a balance between the two costs. To determine the best value for this setting, I ran a three-node single-AZ AWS cluster with 4 vCPU nodes (`m5d.xlarge`). I modified KV to perform writes in an explicit txn and to run multiple statements. I then ran `kv0` with 8 DML statements per txn (a reasonable estimate for the average number of statements that an **explicit** txn runs) and adjusted the batch size of these statements from 1 to 256. This resulted in the following graph: ![image](https://user-images.githubusercontent.com/5438456/49038443-fc91e200-f18a-11e8-810d-1172821e63ea.png) We can see that the cross-over point where txn pipelining stops being beneficial is with batch sizes somewhere between 128 and 256 rows. Given this information, I set the default for kv.transaction.write_pipelining_max_batch_size` to 128. Of course, there are a lot of variables at play here: storage throughput, replication latency, node size, etc. I think the setup I used hits a reasonable middle ground with these. Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>

craig · 2018-11-26T23:00:58Z

Build succeeded

GitHub CI (Cockroach)

nvanbenschoten requested review from tbg and a team November 26, 2018 19:53

nvanbenschoten mentioned this pull request Nov 26, 2018

kv: bound the size of in-flight write tracking in txnPipeliner #32522

Closed

tbg approved these changes Nov 26, 2018

View reviewed changes

nvanbenschoten force-pushed the nvanbenschoten/txnPipelineBatch branch from 9d7db78 to 52242a7 Compare November 26, 2018 22:28

craig bot merged commit 52242a7 into cockroachdb:master Nov 26, 2018

nvanbenschoten mentioned this pull request Nov 26, 2018

release-2.1: kv: set sane default for kv.transaction.write_pipelining_max_batch_size #32621

Merged

nvanbenschoten deleted the nvanbenschoten/txnPipelineBatch branch November 27, 2018 19:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: set sane default for kv.transaction.write_pipelining_max_batch_size #32606

kv: set sane default for kv.transaction.write_pipelining_max_batch_size #32606

nvanbenschoten commented Nov 26, 2018

cockroach-teamcity commented Nov 26, 2018

tbg left a comment

nvanbenschoten commented Nov 26, 2018

craig bot commented Nov 26, 2018

kv: set sane default for kv.transaction.write_pipelining_max_batch_size #32606

kv: set sane default for kv.transaction.write_pipelining_max_batch_size #32606

Conversation

nvanbenschoten commented Nov 26, 2018

cockroach-teamcity commented Nov 26, 2018

tbg left a comment

Choose a reason for hiding this comment

nvanbenschoten commented Nov 26, 2018

craig bot commented Nov 26, 2018

Build succeeded