Skip to content
This repository has been archived by the owner on Nov 18, 2024. It is now read-only.

rfc: add leader-based journal #280

Closed
wants to merge 9 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions docs/rfcs/20220110-leader-based-journal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Leader Based Journal

- Status: accepted
- Discussion: https://github.com/engula/engula/discussions/260
- Pull Request: https://github.com/engula/engula/pull/280

## Summary

In this RPC, we present a trait `LeaderBasedJournal`, which divides the users of `Journal` into two roles: a leader who could write, and followers, who only have read permission. In the same time, this trait provides a means of observing role transition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the following API, role binds to a stream of the journal. Do you intend to elect leader per stream (a.k.a., in stream granularity)?

Here you write leader and followers bind to a journal, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinks, I will fix it.


## Motivation

The luna engine requires a leader to execute mutations, such as journal writing, flushing memory tables into persisted storage, as well as a group of followers who subscribe journal streams and reply mutations, to remain consistent with the engine leader.

To fulfil the luna engine's requirements, the journal need a mechanism to collaborate with luna engine's electing. But the electing method is general and could be utilized by various engine implementations. used by other engine implementations, we decided to abstract a type of journal supports electing (leader-based journal).

In the abstraction of leader-based journals, we need to ensure that there is only one leader at any given moment, and we also need to offer a way for followers to subscribe to journal streams. Finally, we must create an interface for the engine to use in order for it to detect role changes and make appropriate judgments.

Furthermore, the new abstraction must be compatible with the existing journal abstraction in order for users to simply replace it.

## Design

Here is the API design of leader-based journal.

```rust
pub enum RoleState {
Leader,
Follower,
}

pub trait LeaderBasedJournal : Journal {
type Role;
w41ter marked this conversation as resolved.
Show resolved Hide resolved
type Peer;
type StateStream: Stream<Item = RoleState>;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some API like wait_next here so that the caller can wait for the next event?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as it's a stream of state/event, an extension like next() can achieve this out-of-the-box.

@w41ter-l Is "state" just "event" or something different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tisonkun I've added some details, maybe able to answer your question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huachaohuang It seems observe_state is enough to wait for the next event, are there details that I haven't considered?


fn state(&self, name: &str) -> (Self::Role, Option<Self::Peer>);

async fn observe_state(&self, name: &str) -> Self::StateStream;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe watch_state is better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be a difference between the two, is there something I haven't noticed?

}
```

The `LeaderJournal` doesn't affects the semantics of `Journal`, so `Journal::open_stream_writer` could be called whenever a stream isn't a leader. Of course, the implementation should guarantee that calls `StreamWriter::append` or other modifying operations will got a `Error::NotLeader`, if it isn't the stream leader.

The `LeaderBasedJournal` will forwards the electing progress automatically, which the engine won't have to recognize it. However, the engine must initiate that automatic progress manually, because a journal might contains multiple streams, which could exceeds the hardware limitation if we monitors all stream's electing progress. As a result, just streams that the engine is interested in will be watched.

When the engine calls `LeaderBasedJournal::observe_state`, the `LeaderBasedJournal` starts monitoring and subscribing to the electing state transition. It will yield a `Stream` that will be fired whenever one of the electing states changes.

When a leader engine crashes, another machine's `LeaderBasedJournal` instance is elected as the new leader and begins to recover, eventually providing service.

We can't ensure that the state returned by the `observe_state` or `state` methods is always fresh in a distributed system, but any write operations will identify this circumstance. As a result, every decision made before submitting should trigger any write operations to check for freshness.
72 changes: 0 additions & 72 deletions docs/rfcs/20220110-single-writer-journal.md

This file was deleted.