Fix for #2593 #3074

mrtracy · 2015-11-10T00:50:18Z

Finally got to the bottom of this, with special thanks to Tobias, Tamir and Ben.

The test which exercises this most consistently is TestStoreRangeRebalance; however, it was only able to quickly reproduce this in the multicpu branch, where it was reproduced with reliability.

A six node cluster was able to run stably under load for well over an hour with this fix; the bug appears to be completely gone.

tamird · 2015-11-10T01:19:03Z

LGTM. Seems reasonable to me that testing for this is Hard™.

tbg · 2015-11-10T02:48:47Z

multiraft/multiraft.go

+		// itself is removed, the node is re-added to the group
+		// (which will trigger another re-proposal that will succeed), or the
+		// proposal is committed.
+		if initialProposal {


isn't initialProposal equal to the ok in _, ok := g.pending[p.commandID] and thus not required?

tbg · 2015-11-10T02:50:32Z

LGTM mod my question.

bdarnell · 2015-11-10T03:46:21Z

LGTM mod @tschottdorf's question (I think he's right)

The cause of the issue occurred as such: + Incoming client request to a node creates a trace context. + The context is attached to a raft command which is proposed. The command is added to the 'pending' map in multiraft before being proposed. The client request will be answered once the proposed command is committed and applied. + Concurrently, another raft command changes the configuration of the range's raft group, removing this node's replica. In existing code, all pending commands on that node which target that replica are synchronously dismissed with an error; the trace is therefore finalized. + However, while the replica has been removed from the group, the group itself has not yet been removed from the node; the proposed command can actually commit, it just commits after the configuration change. + When the committed change is applied, it attempts to use the trace, but the trace has already been finalized. The fix is to no longer abort pending commands on a replica just because that replica has been removed from the group; it is not yet safe to immediately abort pending requests, because they may actually complete. Instead, we do not abort commands until the group itself is removed (by the range GC queue).

Replica.Quiesce() was added in an attempt to fix cockroachdb#2593; however, it did not fix that issue, and was not necessary to fix it in the first place. This commit removes Replica.Quiesce().

mrtracy · 2015-11-10T17:46:10Z

Yes, you are correct that initialProposal was not necessary. I have updated this with Tobi's suggestion, everything appears to still be in working order.

tamird · 2015-11-10T17:49:57Z

Fix for #2593

tbg reviewed Nov 10, 2015
View reviewed changes

Matt Tracy added 2 commits November 10, 2015 12:45

Remove Replica.Quiesce()

30953c1

Replica.Quiesce() was added in an attempt to fix cockroachdb#2593; however, it did not fix that issue, and was not necessary to fix it in the first place. This commit removes Replica.Quiesce().

mrtracy force-pushed the mtracy/fix_finalized_trace branch from bd39581 to 30953c1 Compare November 10, 2015 17:45

mrtracy added a commit that referenced this pull request Nov 10, 2015

Merge pull request #3074 from mrtracy/mtracy/fix_finalized_trace

09df162

Fix for #2593

mrtracy merged commit 09df162 into cockroachdb:master Nov 10, 2015

mrtracy deleted the mtracy/fix_finalized_trace branch November 10, 2015 18:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for #2593 #3074

Fix for #2593 #3074

mrtracy commented Nov 10, 2015

tamird commented Nov 10, 2015

tbg Nov 10, 2015

tbg commented Nov 10, 2015

bdarnell commented Nov 10, 2015

mrtracy commented Nov 10, 2015

tamird commented Nov 10, 2015

Fix for #2593 #3074

Fix for #2593 #3074

Conversation

mrtracy commented Nov 10, 2015

tamird commented Nov 10, 2015

tbg Nov 10, 2015

Choose a reason for hiding this comment

tbg commented Nov 10, 2015

bdarnell commented Nov 10, 2015

mrtracy commented Nov 10, 2015

tamird commented Nov 10, 2015