Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RaftMemberData.resolveNewLogEntries got requirement failed: A new entry conflicted with a committed entry #165

Closed
xirc opened this issue Jul 27, 2022 · 2 comments · Fixed by #166
Labels
bug Something isn't working
Milestone

Comments

@xirc
Copy link
Contributor

xirc commented Jul 27, 2022

The following error occurred in some fault injection tests:

04:48:20.417 xxx        ERROR   akka.actor.OneForOneStrategy    xxx akka://xxx/system/sharding/raft-shard-xxx-replica-group-3/30/30       -       requirement failed: The entry with index [583] should not conflict with the committed entry (commitIndex [584]) java.lang.IllegalArgumentException: requirement failed: The entry with index [583] should not conflict with the committed entry (commitIndex [584])
    at scala.Predef$.require(Predef.scala:337)
    at lerna.akka.entityreplication.raft.FollowerData.resolveNewLogEntries(RaftMemberData.scala:147)
    at lerna.akka.entityreplication.raft.FollowerData.resolveNewLogEntries$(RaftMemberData.scala:119)
    at lerna.akka.entityreplication.raft.RaftMemberDataImpl.resolveNewLogEntries(RaftMemberData.scala:616)
    at lerna.akka.entityreplication.raft.Follower.lerna$akka$entityreplication$raft$Follower$$receiveAppendEntries(Follower.scala:111)
    at lerna.akka.entityreplication.raft.Follower$$anonfun$followerBehavior$1.applyOrElse(Follower.scala:25)
    at akka.actor.Actor.aroundReceive(Actor.scala:537)
    at akka.actor.Actor.aroundReceive$(Actor.scala:535)
    at lerna.akka.entityreplication.raft.RaftActor.akka$persistence$Eventsourced$$super$aroundReceive(RaftActor.scala:141)
    at akka.persistence.Eventsourced$$anon$4.stateReceive(Eventsourced.scala:923)
    at akka.persistence.Eventsourced.aroundReceive$$original(Eventsourced.scala:251)
    at akka.persistence.Eventsourced.aroundReceive(Eventsourced.scala:148)
    at akka.persistence.Eventsourced.aroundReceive$(Eventsourced.scala:250)
    at lerna.akka.entityreplication.raft.RaftActor.aroundReceive(RaftActor.scala:141)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
    at akka.actor.ActorCell.invoke$$original(ActorCell.scala:548)
    at akka.actor.ActorCell.invoke(ActorCell.scala:61)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
    at akka.dispatch.Mailbox.run$$original(Mailbox.scala:231)
    at akka.dispatch.Mailbox.run(Mailbox.scala:32)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
    at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
    at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
    at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
    at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
    at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
@xirc
Copy link
Contributor Author

xirc commented Jul 27, 2022

From diagnosing logs of another test that also got the requirement failed error, the following situation could be a cause of this error:

RaftActor of replica-group-1

  • 06:15:50.049: It appended log entry (index 788) into ReplicatedLog
  • 06:15:50.262: It updated the commit index to 788
  • 06:16:13.451: It voted other RaftActor in term 7
  • 06:16:13.681: It appended log entry (index 789) into ReplicatedLog
  • 06:16:14.140: It appended log entry (index 790) into ReplicatedLog
  • 06:16:14.140: It updated the commit index to 789
  • ...

RaftActor of replica-group-2

  • 06:06:04.726: It became the leader in term 2
  • 06:15:50.050 ~ 06:15:50.051: It updated the commit index to 788
    • 06:15:50.050: It updated indices for replica-group-1 (next index = 788 -> 789, match index = 787 -> 788) by applying an event SucceededAppendEntries
    • 06:15:50.051: It updated indices for replica-group-3 (next index = 788 -> 789, match index = 787 -> 788) by applying an event SucceededAppendEntries
  • 06:15:56.314: It started a replication of log entry (index 789)
  • 06:16:01.388: It(leader) became a follower by detecting a new term (term 3)
  • ... electing a new leader
  • 06:16:18.102: It restarted due to a persistence failure
  • ... electing a new leader
  • 06:16:33.837: It voted the RaftActor of replica-group-3 on term 11
  • 06:16:36.075: It updated the commit index (0 -> 789)
  • 06:16:36.076: It got the requirement failed error

RaftActor of replica-group-3

  • 06:15:50.049: It appended log entry (index 788) into ReplicatedLog
  • 06:15:50.262: It updated the commit index to 788
  • 06:16:13.448: It became the leader in term 7
    • The RaftActor of replica-group-1 voted this RaftActor (of replica-group-3)
  • 06:16:13.630: It succeeded in replication of NoOp (index 789) and then updated the commit index to 789
    • It updated indices for replica-group-1 (next index = None -> 790, match index = None -> 789) by applying an event SucceededAppendEntries
  • ... electing a new leader
  • 06:16:21.930: It updated the commit index to 832
  • 06:16:22.176: It became a follower by detecting a new term (term 8)
  • 06:16:24.727: It became the leader in term 11
  • 06:16:24.727: It updated indices for all replica groups (next index = None, match index = None)
  • 06:16:24.737 ~ 06:16:24.824:
    • It updated indices for replica-group-1 (next index = None -> 833 -> 835, match index = None 834)
    • It updated the commit index (832 -> 834)
  • 06:16:32.924 ~ 06:16:33.700:
    • 06:16:32.924: It updated the next index for replica-group-2 (None -> 833)
    • 06:16:32.924: It decremented the next index for replica-group-2 (833 -> 832)
    • ... decrementing the next index for replica-group-2
    • 06:16:33.700: It decremented the next index for replica-group-2 (736 -> 735)
  • 06:16:34.037:
    • It updated the next index for replica-group-2 (735 -> 790)
    • It updated the match index for replica-group-2 (None -> 789)

@xirc
Copy link
Contributor Author

xirc commented Jul 27, 2022

There might be at least two possible solutions:

  1. Improve a mechanism for decrementing the next index
  2. Improve a mechanism for advancing the commit index

There might be a reason behind the next index being lower than expected:

  • If a follower is slow temporary, the leader sends multiple AppendEntries messages at some heartbeat interval, and the follower replies with multiple AppendEntriesFaield messages, which decrease the next index than expected
  • If the leader sends multiple AppendEntries (on batch mode), and the follower replies with multiple AppendEntriesFaield, which decreases the next index than expected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant