You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Follower] Applying event [SnapshotSyncCompleted], state diff: [replicatedLog: ReplicatedLog(ancestorTerm=Term(14), ancestorIndex=3630, 549 entries with indices Some(3631)...Some(4179)) -> ReplicatedLog(ancestorTerm=Term(17), ancestorIndex=3746, 0 entries with indices None...None), lastSnapshotStatus: SnapshotStatus(Term(16),3730,Term(17),3746) -> SnapshotStatus(Term(17),3746,Term(17),3746)]
RaftActor (replica-group-2) committed entries (indices 3746 ~ 3748) at 08:02:27.449.
The above snapshot synchronization removed committed log entries that not be included in snapshots.
RaftActor (replica-group-1) was the leader and updated indices for replica-group-2 like the following:
08:04:13.090: Applying event [SucceededAppendEntries]: next index = 3952 -> 3953, match index = 3951 -> 3952
08:04:14.210: Applying event [BecameLeader]: match index = 3953 -> None, match index = 3952 -> None
08:04:50.558: Applying event [DeniedAppendEntries]: next index = None -> 4058
08:04:50.558: Applying event [DeniedAppendEntries]: next index = 4058 -> 4057
...
08:05:05.632: Applying event [DeniedAppendEntries]: next index = 3213 -> 3212
The next index was lower than expected, like the situation described on #165 (comment)
The text was updated successfully, but these errors were encountered:
Like #165 (comment), there might be at least two possible solutions:
Improve a mechanism for decrementing the next index
Improve receiving InstallSnapshot messages
RaftActor can skip snapshot synchronization and reply with InstallSnapshotSucceeded immediately, which doesn't remove committed log entry that not be included in snapshots.
It happened in some fault injection tests.
An entity (called as entity X) on RaftActor (replica-group-2) got data inconsistency:
08:05:24.836
):ApplySnapshot(entitySnapshot=[None])
messageFetchEntityEvents(..., from=[1], to=[4179], ...)
messageRecoveryState(snapshot=[None], events=([11] entries))
message08:16:10.985
):ApplySnapshot(entitySnapshot=[None])
messageFetchEntityEvents(..., from=[1], to=[4179], ...)
RecoveryState(snapshot=[None], events=([0] entries))
messageOn the second recovery, entity X didn't receive events that the first recovery contained, which means that entity X got data inconsistency.
On the other hand, RaftActor (replica-group-2) started snapshot synchronization like the following:
08:05:03.392
: RaftActor (replica-group-2) started snapshot synchronization:[Follower] Applying event [SnapshotSyncStarted], state diff: [lastSnapshotStatus: SnapshotStatus(Term(16),3730,Term(16),3730) -> SnapshotStatus(Term(16),3730,Term(17),3746)]
08:16:10.517
: RaftActor (replica-group-2) completed snapshot synchronization:[Follower] Applying event [SnapshotSyncCompleted], state diff: [replicatedLog: ReplicatedLog(ancestorTerm=Term(14), ancestorIndex=3630, 549 entries with indices Some(3631)...Some(4179)) -> ReplicatedLog(ancestorTerm=Term(17), ancestorIndex=3746, 0 entries with indices None...None), lastSnapshotStatus: SnapshotStatus(Term(16),3730,Term(17),3746) -> SnapshotStatus(Term(17),3746,Term(17),3746)]
RaftActor (replica-group-2) committed entries (indices 3746 ~ 3748) at
08:02:27.449
.The above snapshot synchronization removed committed log entries that not be included in snapshots.
RaftActor (replica-group-1) was the leader and updated indices for replica-group-2 like the following:
08:04:13.090
:Applying event [SucceededAppendEntries]: next index = 3952 -> 3953, match index = 3951 -> 3952
08:04:14.210
:Applying event [BecameLeader]: match index = 3953 -> None, match index = 3952 -> None
08:04:50.558
:Applying event [DeniedAppendEntries]: next index = None -> 4058
08:04:50.558
:Applying event [DeniedAppendEntries]: next index = 4058 -> 4057
08:05:05.632
:Applying event [DeniedAppendEntries]: next index = 3213 -> 3212
The next index was lower than expected, like the situation described on #165 (comment)
The text was updated successfully, but these errors were encountered: