Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leader continued replying with ReplicationFailed #170

Closed
xirc opened this issue Aug 23, 2022 · 0 comments · Fixed by #171
Closed

Leader continued replying with ReplicationFailed #170

xirc opened this issue Aug 23, 2022 · 0 comments · Fixed by #171
Labels
bug Something isn't working
Milestone

Comments

@xirc
Copy link
Contributor

xirc commented Aug 23, 2022

Situation

The following log continued (up to 1700) in some fault injection tests:

[Leader] failed to replicate the event (type=[lerna.akka.entityreplication.raft.model.NoOp$]) since the entity (entityId=[0000059981], instanceId=[34156], lastAppliedIndex=[814]) must apply [1] entries to itself. The leader will replicate a new event after the entity applies these [1] non-applied entries to itself.

By diagnosing logs, the following situation happened:

  1. RaftActor (replica-group-1) was the leader.
  2. Entity (id=0000059981, replica-group-1) succeeded in replication of NoOp.
    • The entity's lastAppliedLogEntryIndex was 814.
    • The NoOp replication was succeeded with index 821.
  3. RaftActor (replica-group-2, Follower) updated indices to 821 (commitIndex=821, lastApplied=821).
    • RaftActor (replica-group-2, Follower) didn't send Replica for index 821 to the entity since an associated event is NoOp.
    • Entity (id=0000059981, replica-group-2) didn't update its lastAppliedLogEntryIndex to 821.
  4. RaftActor (replica-group-2) became the leader for some reasons.
  5. Entity (id=0000059981, replica-group-2) received ProcessCommand and then attempted to replicate an event:
    • Entity (id=0000059981, replica-group-2) sent Replicate(entityLastAppliedIndex=814, ...)
  6. RaftActor (replica-group-2, Leader) replied with ReplicationFaield

Replica for EntityEvent(Some(entityId), NoOp) is not sent:

def applyToReplicationActor(logEntry: LogEntry): Unit =
logEntry.event match {
case EntityEvent(_, NoOp) => // NoOp は replicationActor には関係ないので転送しない
case EntityEvent(Some(entityId), event) =>
if (log.isDebugEnabled) log.debug("=== [{}] applying {} to ReplicationActor ===", currentState, event)
replicationActor(entityId) ! Replica(logEntry)
case EntityEvent(None, event) =>
if (log.isWarningEnabled)
log.warning("=== [{}] {} was not applied, because it is not assigned any entity ===", currentState, event)
}

Possible solutions

  1. RaftActor will send Replica to an entity also if an EntityEvent contains NoOp.
  2. Leader will start replication if non-applied entries contain only NoOp.
@xirc xirc added this to the v2.1.1 milestone Aug 23, 2022
@xirc xirc added the bug Something isn't working label Oct 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant