kv: set sane default for kv.transaction.write_pipelining_max_batch_size

Informs cockroachdb#32522. There is a tradeoff here between the overhead of waiting for consensus for a batch if we don't pipeline and proving that all of the writes in the batch succeed if we do pipeline. We set this default to a value which experimentally strikes a balance between the two costs. To determine the best value for this setting, I ran a three-node single-AZ AWS cluster with 4 vCPU nodes (`m5d.xlarge`). I modified KV to perform writes in an explicit txn and to run multiple statements. I then ran `kv0` with 8 DML statements per txn (a reasonable estimate for the average number of statements that an **explicit** txn runs) and adjusted the batch size of these statements from 1 to 256. This resulted in the following graph: <see graph in PR> We can see that the cross-over point where txn pipelining stops being beneficial is with batch sizes somewhere between 128 and 256 rows. Given this information, I set the default for `kv.transaction.write_pipelining_max_batch_size` to 128. Of course, there are a lot of variables at play here: storage throughput, replication latency, node size, etc. I think the setup I used hits a reasonable middle ground with these. Release note: None
nvanbenschoten · Nov 26, 2018 · 9d7db78 · 9d7db78
1 parent c767737
commit 9d7db78
Showing 1 changed file with 9 additions and 1 deletion.
diff --git a/pkg/kv/txn_interceptor_pipeliner.go b/pkg/kv/txn_interceptor_pipeliner.go
@@ -36,7 +36,15 @@ var pipelinedWritesEnabled = settings.RegisterBoolSetting(
 var pipelinedWritesMaxBatchSize = settings.RegisterNonNegativeIntSetting(
 	"kv.transaction.write_pipelining_max_batch_size",
 	"if non-zero, defines that maximum size batch that will be pipelined through Raft consensus",
-	0,
+	// NB: there is a tradeoff between the overhead of synchronously waiting for
+	// consensus for a batch if we don't pipeline and proving that all of the
+	// writes in the batch succeed if we do pipeline. We set this default to a
+	// value which experimentally strikes a balance between the two costs.
+	//
+	// Notably, this is well below sql.max{Insert/Update/Upsert/Delete}BatchSize,
+	// so implicit SQL txns should never pipeline their writes - they should either
+	// hit the 1PC fast-path or should have batches which exceed this limit.
+	128,
 )
 
 // txnPipeliner is a txnInterceptor that pipelines transactional writes by using