Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An entity could stick at WaitForReplication when a Raft log entry is truncated by conflict #155

Closed
xirc opened this issue Jul 5, 2022 · 0 comments · Fixed by #162
Closed
Labels
bug Something isn't working
Milestone

Comments

@xirc
Copy link
Contributor

xirc commented Jul 5, 2022

Situation

  1. An entity on node A sends a Replicate message to its RaftActor (supposed that called as RaftActor A), and then waits for a Replication result (ReplicationSucceeded, ReplicationFaield, or Replica).
  2. RaftActor A becomes a Follower for some reason.
    • There is another leader at this point (supposed that in node B).
    • Suppose that a Raft log entry for the replication (started by node A's Raft Actor) is conflicted for some reason, the new leader truncates the entry.
    • The replication for the entity will not progress; The old leader on node A will not send any Replication result to the entity.
  3. Suppose that the leader on node B didn't receive a command for the entity. The entity's state is WaitForReplication since the entity doesn't receive any Replica.
  4. RaftActor A becomes a Leader again for some reason.
  5. The entity continues to stash new incoming ProcessCommand because its state is WaitForReplication.

Possible Solution

RaftActor sends a ReplicationFaied message to an entity if the entity is waiting for a Raft log entry truncated by conflict. This might be possible to achieve at AppendEntries handling:

Conflicted entries are existing entries of the Raft log (called ReplicatedLog) with indices greater than or equal to the index of the head of newEntries.

@xirc xirc changed the title An entity might stick at WaitForReplication when a Raft log entry is truncated by conflict An entity could stick at WaitForReplication when a Raft log entry is truncated by conflict Jul 13, 2022
@xirc xirc added this to the v2.1.1 milestone Jul 15, 2022
@xirc xirc added the bug Something isn't working label Oct 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant