-
Notifications
You must be signed in to change notification settings - Fork 42
Two Leaders Elected in the Same Term #47
Comments
Excellent work Colin, looks really good. I need to find the time to go over your discovery! |
I was able to minimize this test case further, by decreasing the cluster size. Repro steps for smaller case:
Console output. Messages delivered:
At the end of this, raft-member-7 and raft-member-8 are both elected as leader in the same term. |
@colin-scott Hi! I tried to run the test case that you provided but it appears that the code is changed and
I get a lot of merge conflicts and I'm unable to run the test. I want to try fixing this issue and probably others too as I'm planning to use this library in my own project. |
Hey @dmitraver, Thanks for pointing out the issue. I corrected the repro steps above; if you start the repro steps from scratch it should work. (The Thanks! |
@dmitraver for what it's worth, I think the root cause of this bug is described in this issue: |
@colin-scott Thx Colin! I will look into that. |
Thanks guys! Still wasn't able to resume work on it, a lot of stuff on Akka itself needs shipping soon so focused my efforts there :) One note:
Please don't if it's any real-life / production project – this implementation has been proven to not yet be correct so it would be a bad idea to use in a real project. Safety first! |
@ktoso Akka is awesome tool! I just want to contribute to this project :) I've seen a lot of issues was opened previously so will try to fix as much as I can and make a PR. This is not intended to be used in production(maybe later) but more for my own akka studies. |
Awesome! For learning things on distributed systems and Akka + fixing things this project is ideal I think :-) |
@ktoso Thanks for support :) |
Calling this a "test case" is a bit of a misnomer. Think of it as a I can show you how I ran the initial fuzz test if that would be easier. Or I can show you how to run the replayer in a mode that doesn't crash whenever it diverges. On Thu, Jul 23, 2015 at 5:46 AM Dmitry Avershin [email protected]
|
I think I fixed that issue, now your test shows no errors and passes 2015-07-23 20:35 GMT+02:00 Colin Scott [email protected]:
Best wishes, |
Hello,
I have a test case that, as far as I can tell, causes akka-raft to violate Raft's "Election Safety" property (see Figure 3 from the paper), i.e. it appears that two leaders are elected for the same term.
The test case consists of the following external events:
ChangeConfiguration
messagescontaining ActorRefs for all 9 RaftActors. raft-member-8 does not receive a
ChangeConfiguration
message.Upon running the test, I see the following in the console output:
For what it's worth, rather than inspecting the console output to detect this bug, we took a distributed snapshot of all RaftActor's states and found that in the same snapshot raft-member-9 is in state
LeaderMeta(Actor[akka://new-system-0/user/raft-member-9],Term(2))
whileraft-member-7 is in state
LeaderMeta(Actor[akka://new-system-0/user/raft-member-7],Term(2))
.In our failing execution, we have the akka runtime deliver 27 total messages, including the 8
ChangeConfiguration
messages. The delivery order is as follows (format is sender,receiver,message):Based on that delivery order, it appears that raft-member-9 receives votes from raft-member-{1,6,3,2,9}, and raft-member-7 receives votes from raft-member-{1,5,4,7}. A few things are strange about this: the votes received by raft-member-9 appear to be from different Terms; and raft-member-7 does not actually receive a quorum of votes (regardless of Term). I'm not exactly sure what the root cause is here.
We made akka's message scheduler deterministic so that you can easily reproduce the bug for yourself. Steps to reproduce:
From there you should be able to add logging statements and continue replaying as many times as needed.
We made a few small changes to akka-raft to generate this test case:
use a seeded random number generator to make the execution
deterministic.
distributed snapshots.
Follower.scala to always begin an election
rather than check if
electionDeadline.isOverdue()
. This is again to makethe execution deterministic (since
electionDeadline.isOverdue()
callsgettimeofday
), but that shouldn't affect correctness as far as I cantell.
Let me know if you have any questions about how we ran this test.
Thanks!
-Colin
The text was updated successfully, but these errors were encountered: