Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReplicatedEntity can produce illegal snapshot if compaction and receiving new event occur same time #111

Closed
negokaz opened this issue Dec 9, 2021 · 1 comment · Fixed by #115
Labels
bug Something isn't working
Milestone

Comments

@negokaz
Copy link
Contributor

negokaz commented Dec 9, 2021

Overview

[JVM-4] === [Follower] append AppendEntries(NormalizedShardId(146),Term(1),replica-group-2,0,Term(0),Vector(LogEntry(1, EntityEvent(None,NoOp), Term(1)), LogEntry(2, EntityEvent(Some(NormalizedEntityId(test)),Incremented(1)), Term(1)), LogEntry(3, EntityEvent(Some(NormalizedEntityId(test)),Incremented(1)), Term(1)), LogEntry(4, EntityEvent(Some(NormalizedEntityId(test)),Incremented(1)), Term(1)), LogEntry(5, EntityEvent(Some(NormalizedEntityId(test)),Incremented(1)), Term(1)), LogEntry(6, EntityEvent(Some(NormalizedEntityId(test)),Incremented(1)), Term(1)), LogEntry(7, EntityEvent(Some(NormalizedEntityId(test)),Incremented(1)), Term(1)), LogEntry(8, EntityEvent(Some(NormalizedEntityId(test)),Incremented(1)), Term(1)), LogEntry(9, EntityEvent(Some(NormalizedEntityId(test)),Incremented(1)), Term(1)), LogEntry(10, EntityEvent(Some(NormalizedEntityId(test)),Incremented(1)), Term(1)), LogEntry(11, EntityEvent(Some(NormalizedEntityId(test)),Incremented(1)), Term(1)), LogEntry(12, EntityEvent(Some(NormalizedEntityId(test)),Incremented(1)), Term(1))),12) ===
[JVM-4] [Follower] compaction started (logEntryIndex: 12, number of entities: 1)
[JVM-4] === [Follower] append AppendEntries(NormalizedShardId(146),Term(1),replica-group-2,12,Term(1),Vector(LogEntry(13, EntityEvent(Some(NormalizedEntityId(test)),Incremented(1)), Term(1)), LogEntry(14, EntityEvent(Some(NormalizedEntityId(test)),Incremented(1)), Term(1)), LogEntry(15, EntityEvent(Some(NormalizedEntityId(test)),Incremented(1)), Term(1))),14) ===
[JVM-4] === [Follower] applying Incremented(1) to ReplicationActor ===
[JVM-4] === [Follower] applying Incremented(1) to ReplicationActor ===
[JVM-4] === [Follower] append AppendEntries(NormalizedShardId(146),Term(1),replica-group-2,15,Term(1),Vector(),15) ===
[JVM-4] === [Follower] applying Incremented(1) to ReplicationActor ===
[JVM-4] state updated: State(1)
[JVM-4] state updated: State(2)
[JVM-4] state updated: State(3)
[JVM-4] state updated: State(4)
[JVM-4] state updated: State(5)
[JVM-4] state updated: State(6)
[JVM-4] state updated: State(7)
[JVM-4] state updated: State(8)
[JVM-4] state updated: State(9)
[JVM-4] state updated: State(10)
[JVM-4] state updated: State(11)
[JVM-4] state updated: State(12)
[JVM-4] state updated: State(13)
[JVM-4] state updated: State(14)
[JVM-4] save snapshot: EntitySnapshot(EntitySnapshotMetadata(NormalizedEntityId(test),12),EntityState(State(14)))

branch for reproduction: 6c10bb7...sandbox/consistency-leak-entity

image

Solution 1: ReplicatedEntity manages its own recovery

  • Pros
    • ReplicatedEntity can produce a snapshot to RaftActor for first TakeSnapshot
    • No breaking changes
  • Cons
    • Complex message protocol

image

Solution 2: ReplicatedEntity drop events of RecoveryState

  • Pros
    • A few message protocol changes
    • No breaking changes
  • Cons
    • It is difficult to understand without knowing the background

image

Solution 3: ReplicatedEntity ignores TakeSnapshot while recovering

  • Pros
    • No change in message protocol
    • No breaking changes
  • Cons
    • ReplicatedEntity cannot produce a snapshot to RaftActor for first TakeSnapshot

image

@negokaz negokaz added the bug Something isn't working label Dec 9, 2021
@negokaz
Copy link
Contributor Author

negokaz commented Dec 13, 2021

We have decided that we deal with this issue with Solution 1.

  • We concerned about the side effects of cons in solution 3
  • A message protocol that is easy to understand is better
    • message protocol of solution 2 is a bit clumsy and understanding the background is difficult

@xirc xirc added this to the v2.1.0 milestone Dec 17, 2021
negokaz added a commit that referenced this issue Dec 29, 2021
RaftActor can send TakeSnapshot message during entity recovery.
Index of the snapshot is determined by lastApplied at sending TakeSnapshot.
RaftActor can provide new events to the entity during entity recovery and this process increments lastApplied.
RecoveryState message to recovery the entity is created from lastApplied.

These will cause a mismatch of the index of the snapshot and recovered state of the entity.
For more details, see #111.

The Activate message allows RaftActor to restore the state at just RaftActor creates the entity.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants