Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for #2593 #3074

Merged
merged 2 commits into from
Nov 10, 2015
Merged

Conversation

mrtracy
Copy link
Contributor

@mrtracy mrtracy commented Nov 10, 2015

Finally got to the bottom of this, with special thanks to Tobias, Tamir and Ben.

The test which exercises this most consistently is TestStoreRangeRebalance; however, it was only able to quickly reproduce this in the multicpu branch, where it was reproduced with reliability.

A six node cluster was able to run stably under load for well over an hour with this fix; the bug appears to be completely gone.

Review on Reviewable

@tamird
Copy link
Contributor

tamird commented Nov 10, 2015

LGTM. Seems reasonable to me that testing for this is Hard™.

// itself is removed, the node is re-added to the group
// (which will trigger another re-proposal that will succeed), or the
// proposal is committed.
if initialProposal {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't initialProposal equal to the ok in _, ok := g.pending[p.commandID] and thus not required?

@tbg
Copy link
Member

tbg commented Nov 10, 2015

LGTM mod my question.

@bdarnell
Copy link
Contributor

LGTM mod @tschottdorf's question (I think he's right)

Matt Tracy added 2 commits November 10, 2015 12:45
The cause of the issue occurred as such:

+ Incoming client request to a node creates a trace context.
+ The context is attached to a raft command which is proposed. The command is
added to the 'pending' map in multiraft before being proposed. The client
request will be answered once the proposed command is committed and applied.
+ Concurrently, another raft command changes the configuration of the range's
raft group, removing this node's replica. In existing code, all pending commands
on that node which target that replica are synchronously dismissed with an
error; the trace is therefore finalized.
+ However, while the replica has been removed from the group, the group itself
has not yet been removed from the node; the proposed command can actually commit,
it just commits after the configuration change.
+ When the committed change is applied, it attempts to use the trace, but the
trace has already been finalized.

The fix is to no longer abort pending commands on a replica just because that
replica has been removed from the group; it is not yet safe to immediately
abort pending requests, because they may actually complete. Instead, we do not
abort commands until the group itself is removed (by the range GC queue).
Replica.Quiesce() was added in an attempt to fix cockroachdb#2593; however, it did not fix
that issue, and was not necessary to fix it in the first place. This commit
removes Replica.Quiesce().
@mrtracy mrtracy force-pushed the mtracy/fix_finalized_trace branch from bd39581 to 30953c1 Compare November 10, 2015 17:45
@mrtracy
Copy link
Contributor Author

mrtracy commented Nov 10, 2015

Yes, you are correct that initialProposal was not necessary. I have updated this with Tobi's suggestion, everything appears to still be in working order.

@tamird
Copy link
Contributor

tamird commented Nov 10, 2015

:shipit:

mrtracy added a commit that referenced this pull request Nov 10, 2015
@mrtracy mrtracy merged commit 09df162 into cockroachdb:master Nov 10, 2015
@mrtracy mrtracy deleted the mtracy/fix_finalized_trace branch November 10, 2015 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants