From f7f90a8a78d7ffbe5faf076ff5b294893fbec9b1 Mon Sep 17 00:00:00 2001 From: w41ter Date: Mon, 10 Jan 2022 20:53:29 +0800 Subject: [PATCH 1/9] rfc: add signle write journal --- .../single-write-journal-architecture.svg | 1 + docs/rfcs/20220110-single-writer-journal.md | 71 +++++++++++++++++++ 2 files changed, 72 insertions(+) create mode 100644 docs/images/single-write-journal-architecture.svg create mode 100644 docs/rfcs/20220110-single-writer-journal.md diff --git a/docs/images/single-write-journal-architecture.svg b/docs/images/single-write-journal-architecture.svg new file mode 100644 index 00000000..a6598b88 --- /dev/null +++ b/docs/images/single-write-journal-architecture.svg @@ -0,0 +1 @@ +MasterOrchestratorJournal ServerJournal ServerJournal ServerJournal Client (L)Journal Client (F)Journal Serverprovisionde-provisionPullWriteWriteWriteRead \ No newline at end of file diff --git a/docs/rfcs/20220110-single-writer-journal.md b/docs/rfcs/20220110-single-writer-journal.md new file mode 100644 index 00000000..1559ec7e --- /dev/null +++ b/docs/rfcs/20220110-single-writer-journal.md @@ -0,0 +1,71 @@ +# Single write journal + +- Status: draft +- Pull Request: + +## Abstraction + +The luna engine needs a single-writing, multi-reading journal system. + +## Design + +### API + +This has discussed in [#260](https://github.com/engula/engula/discussions/260). A trait named `SingleWriteJournal` is introduced to observe the role changes: + +```rust +pub enum RoleState { + Leader, + Follower, +} + +pub trait SingleWriteJournal : Journal { + type Role; + type Peer; + type StateStream: Stream; + + fn state(&self, name: &str) -> (Self::Role, Option); + + async fn observe_state(&self, name: &str) -> Self::StateStream; +} +``` + +The `SingleWriteJournal` doesn't effects the semantics of `Journal`, so `Journal::open_stream_writer` could be called whenever a stream isn't a leader. Of course, the implementation should guarantee that calls `StreamWriter::append` or others modifying operations will got a `Error::NotLeader`, if it isn't the stream leader. + +### Architecture + +![single write journal architecture](../images/single-write-journal-architecture.svg) + +A `SingleWriteJournal` consists of a master, journal orchestrator, a set of journal server and a set of journal client which is parts of engine. + +The journal server provides the durability of events of journal. All events are produced by journal client. At the same time, only one journal client could produce events which will be accepted by journal servers. That one is called leader, the others journal client are followers. + +The master is responsible for electing new leader and detecting the leader's live. The master is also responsible for providing routers and balancing loads among journal servers. To scale the set of journal servers on-demand, the master provisions or de-provisions journal servers from the orchestrator. + +#### Electing and Fault detecting + +The master collects status and stats from both journal client and server periodic via heartbeat RPC requests. If master haven't received heartbeats from current leader after a while, it would choose a new client as new leader, a set servers as replication group, and assign a monotonic epoch to the new leader. + +The order of events in an replication group is decided by leader. In order to ensure the consistency of orders of events, before committing any events, a leader should ensure there no any events which produced by former leader would be accepted by journal servers. Via a sealing RPC requests, the new master requires all journal servers don't accepts any events with small epoch. + +#### Replication Policy + +A events will be replicated to all journal servers of an replication group eventually. Once a event is replicated to enough journal servers which is specified by the replication policy, the event could be commit and apply to engine. + +#### Reconfiguration + +In general, an replication policy allows that a journal server downtime unexpectedly, but not effects the writing operations. In order to keep the availability in this situition, master would enforce leader to seal previous events and allocate new epoch so that it could change the configuration such as replication group to remove the faulted nodes. + +#### Follower read + +A leader will broadcast the committed sequence of events to all journal server, and those events is visible for reading. But here exists a gap between a event become committed in leader and a event is readable in a journal server. So a follower want to read events with consistency, it should ask the latest committed sequence from leader and wait until it receive those events. + +### Future works + +#### Chain replication + +A leader might be the bottleneck, since it is responsible for replicating events to all journal servers. We could employs the chain replication mechanism that allow journal servers replicate events to other servers. Specially, a leader could use chain replication to replicate events to all followers. + +#### Archive + +After a series of events are sealed, those events could be put into s3 to reduce usage of local disk. Specially, user could manually archive some events to a cheap stores. From 68c95a85da674d26d5041f953a57e5e352819006 Mon Sep 17 00:00:00 2001 From: w41ter Date: Mon, 10 Jan 2022 21:44:51 +0800 Subject: [PATCH 2/9] fix gramma wrongs --- .../single-write-journal-architecture.svg | 2 +- docs/rfcs/20220110-single-writer-journal.md | 20 +++++++++---------- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/images/single-write-journal-architecture.svg b/docs/images/single-write-journal-architecture.svg index a6598b88..b8f36e0a 100644 --- a/docs/images/single-write-journal-architecture.svg +++ b/docs/images/single-write-journal-architecture.svg @@ -1 +1 @@ -MasterOrchestratorJournal ServerJournal ServerJournal ServerJournal Client (L)Journal Client (F)Journal Serverprovisionde-provisionPullWriteWriteWriteRead \ No newline at end of file +MasterOrchestratorJournal ServerJournal ServerJournal ServerJournal Client (L)Journal Client (F)Journal Serverprovisionde-provisionPullWriteWriteWriteRead \ No newline at end of file diff --git a/docs/rfcs/20220110-single-writer-journal.md b/docs/rfcs/20220110-single-writer-journal.md index 1559ec7e..02352cf6 100644 --- a/docs/rfcs/20220110-single-writer-journal.md +++ b/docs/rfcs/20220110-single-writer-journal.md @@ -30,41 +30,41 @@ pub trait SingleWriteJournal : Journal { } ``` -The `SingleWriteJournal` doesn't effects the semantics of `Journal`, so `Journal::open_stream_writer` could be called whenever a stream isn't a leader. Of course, the implementation should guarantee that calls `StreamWriter::append` or others modifying operations will got a `Error::NotLeader`, if it isn't the stream leader. +The `SingleWriteJournal` doesn't affects the semantics of `Journal`, so `Journal::open_stream_writer` could be called whenever a stream isn't a leader. Of course, the implementation should guarantee that calls `StreamWriter::append` or other modifying operations will got a `Error::NotLeader`, if it isn't the stream leader. ### Architecture ![single write journal architecture](../images/single-write-journal-architecture.svg) -A `SingleWriteJournal` consists of a master, journal orchestrator, a set of journal server and a set of journal client which is parts of engine. +A `SingleWriteJournal` consists of a master, journal orchestrator, a set of journal servers and a set of journal clients(which are parts of engine). -The journal server provides the durability of events of journal. All events are produced by journal client. At the same time, only one journal client could produce events which will be accepted by journal servers. That one is called leader, the others journal client are followers. +The journal server provides the durability of events of journal. All events are produced by journal client. At the same time, only one journal client could produce events which will be accepted by journal servers. That one is called leader, the other journal clients are followers. -The master is responsible for electing new leader and detecting the leader's live. The master is also responsible for providing routers and balancing loads among journal servers. To scale the set of journal servers on-demand, the master provisions or de-provisions journal servers from the orchestrator. +The master is responsible for electing a new leader and detecting the leader's live. The master is also responsible for providing routers and balancing loads among journal servers. To scale the set of journal servers on-demand, the master provisions or de-provisions journal servers from the orchestrator. #### Electing and Fault detecting -The master collects status and stats from both journal client and server periodic via heartbeat RPC requests. If master haven't received heartbeats from current leader after a while, it would choose a new client as new leader, a set servers as replication group, and assign a monotonic epoch to the new leader. +The master collects status and stats from both journal client and server periodic via heartbeat RPC requests. If the master haven't received heartbeats from the current leader after a while, it would choose a new client as the new leader, a set of servers as replication group, and assign a monotonic epoch to the new leader. -The order of events in an replication group is decided by leader. In order to ensure the consistency of orders of events, before committing any events, a leader should ensure there no any events which produced by former leader would be accepted by journal servers. Via a sealing RPC requests, the new master requires all journal servers don't accepts any events with small epoch. +The order of events in a replication group is decided by the leader. In order to ensure the consistency of orders of events, before committing any events, a leader should ensure there no any events which produced by former leader would be accepted by journal servers. Via sealing RPC requests, the new master requires all journal servers don't accepts any events with small epoch. #### Replication Policy -A events will be replicated to all journal servers of an replication group eventually. Once a event is replicated to enough journal servers which is specified by the replication policy, the event could be commit and apply to engine. +A event will be replicated to all journal servers of a replication group eventually. Once an event is replicated to enough journal servers, which is specified by the replication policy, the event could be committed and applied to engine. #### Reconfiguration -In general, an replication policy allows that a journal server downtime unexpectedly, but not effects the writing operations. In order to keep the availability in this situition, master would enforce leader to seal previous events and allocate new epoch so that it could change the configuration such as replication group to remove the faulted nodes. +In general, a replication policy allows a journal server downtime unexpectedly, but does not affects the writing operations. In order to keep the availability in this situation, master would enforce leader to seal previous events and allocate new epoch so that it could change the configuration such as replication group to remove the faulted nodes. #### Follower read -A leader will broadcast the committed sequence of events to all journal server, and those events is visible for reading. But here exists a gap between a event become committed in leader and a event is readable in a journal server. So a follower want to read events with consistency, it should ask the latest committed sequence from leader and wait until it receive those events. +A leader will broadcast the committed sequence of events to all journal server, and those events is visible for reading. But here exists a gap between an event become committed in leader and an event is readable in a journal server. So a follower want to read events with consistency, it should ask the latest committed sequence from leader and wait until it receive those events. ### Future works #### Chain replication -A leader might be the bottleneck, since it is responsible for replicating events to all journal servers. We could employs the chain replication mechanism that allow journal servers replicate events to other servers. Specially, a leader could use chain replication to replicate events to all followers. +A leader might be the bottleneck, since it is responsible for replicating events to all journal servers. We could employ the chain replication mechanism that allow journal servers replicating events to other servers. Specially, a leader could use chain replication to replicate events to all followers. #### Archive From 907ae124c050eb73bf24a1e543d591ea9f3d85ab Mon Sep 17 00:00:00 2001 From: w41ter Date: Thu, 13 Jan 2022 14:14:14 +0800 Subject: [PATCH 3/9] Update 20220110-single-writer-journal.md --- docs/rfcs/20220110-single-writer-journal.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/rfcs/20220110-single-writer-journal.md b/docs/rfcs/20220110-single-writer-journal.md index 02352cf6..4e217dd2 100644 --- a/docs/rfcs/20220110-single-writer-journal.md +++ b/docs/rfcs/20220110-single-writer-journal.md @@ -1,7 +1,8 @@ # Single write journal -- Status: draft -- Pull Request: +- Status: accepted +- Pull Request: https://github.com/engula/engula/pull/280 +- Tracking Issue: https://github.com/engula/engula/pull/284 ## Abstraction From a4117dde4da2f91f27b3a99d4c9dfc26eac7e1c0 Mon Sep 17 00:00:00 2001 From: w41ter Date: Fri, 14 Jan 2022 15:35:16 +0800 Subject: [PATCH 4/9] separate to leader based journal and shared journal --- docs/rfcs/20220110-leader-based-journal.md | 50 ++++++++++++++ docs/rfcs/20220110-single-writer-journal.md | 72 --------------------- 2 files changed, 50 insertions(+), 72 deletions(-) create mode 100644 docs/rfcs/20220110-leader-based-journal.md delete mode 100644 docs/rfcs/20220110-single-writer-journal.md diff --git a/docs/rfcs/20220110-leader-based-journal.md b/docs/rfcs/20220110-leader-based-journal.md new file mode 100644 index 00000000..b153432c --- /dev/null +++ b/docs/rfcs/20220110-leader-based-journal.md @@ -0,0 +1,50 @@ +# Leader Based Journal + +- Status: accepted +- Discussion: https://github.com/engula/engula/discussions/260 +- Pull Request: https://github.com/engula/engula/pull/280 + +## Summary + +In this RPC, we present a trait `LeaderBasedJournal`, which divides the users of `Journal` into two roles: a leader who could write, and followers, who only have read permission. In the same time, this trait provides a means of observing role transition. + +## Motivation + +The luna engine requires a leader to execute mutations, such as journal writing, flushing memory tables into persisted storage, as well as a group of followers who subscribe journal streams and reply mutations, to remain consistent with the engine leader. + +To fulfil the luna engine's requirements, the journal need a mechanism to collaborate with luna engine's electing. But the electing method is general and could be utilized by various engine implementations. used by other engine implementations, we decided to abstract a type of journal supports electing (leader-based journal). + +In the abstraction of leader-based journals, we need to ensure that there is only one leader at any given moment, and we also need to offer a way for followers to subscribe to journal streams. Finally, we must create an interface for the engine to use in order for it to detect role changes and make appropriate judgments. + +Furthermore, the new abstraction must be compatible with the existing journal abstraction in order for users to simply replace it. + +## Design + +Here is the API design of leader-based journal. + +```rust +pub enum RoleState { + Leader, + Follower, +} + +pub trait LeaderBasedJournal : Journal { + type Role; + type Peer; + type StateStream: Stream; + + fn state(&self, name: &str) -> (Self::Role, Option); + + async fn observe_state(&self, name: &str) -> Self::StateStream; +} +``` + +The `LeaderJournal` doesn't affects the semantics of `Journal`, so `Journal::open_stream_writer` could be called whenever a stream isn't a leader. Of course, the implementation should guarantee that calls `StreamWriter::append` or other modifying operations will got a `Error::NotLeader`, if it isn't the stream leader. + +The `LeaderBasedJournal` will forwards the electing progress automatically, which the engine won't have to recognize it. However, the engine must initiate that automatic progress manually, because a journal might contains multiple streams, which could exceeds the hardware limitation if we monitors all stream's electing progress. As a result, just streams that the engine is interested in will be watched. + +When the engine calls `LeaderBasedJournal::observe_state`, the `LeaderBasedJournal` starts monitoring and subscribing to the electing state transition. It will yield a `Stream` that will be fired whenever one of the electing states changes. + +When a leader engine crashes, another machine's `LeaderBasedJournal` instance is elected as the new leader and begins to recover, eventually providing service. + +We can't ensure that the state returned by the `observe_state` or `state` methods is always fresh in a distributed system, but any write operations will identify this circumstance. As a result, every decision made before submitting should trigger any write operations to check for freshness. diff --git a/docs/rfcs/20220110-single-writer-journal.md b/docs/rfcs/20220110-single-writer-journal.md deleted file mode 100644 index 4e217dd2..00000000 --- a/docs/rfcs/20220110-single-writer-journal.md +++ /dev/null @@ -1,72 +0,0 @@ -# Single write journal - -- Status: accepted -- Pull Request: https://github.com/engula/engula/pull/280 -- Tracking Issue: https://github.com/engula/engula/pull/284 - -## Abstraction - -The luna engine needs a single-writing, multi-reading journal system. - -## Design - -### API - -This has discussed in [#260](https://github.com/engula/engula/discussions/260). A trait named `SingleWriteJournal` is introduced to observe the role changes: - -```rust -pub enum RoleState { - Leader, - Follower, -} - -pub trait SingleWriteJournal : Journal { - type Role; - type Peer; - type StateStream: Stream; - - fn state(&self, name: &str) -> (Self::Role, Option); - - async fn observe_state(&self, name: &str) -> Self::StateStream; -} -``` - -The `SingleWriteJournal` doesn't affects the semantics of `Journal`, so `Journal::open_stream_writer` could be called whenever a stream isn't a leader. Of course, the implementation should guarantee that calls `StreamWriter::append` or other modifying operations will got a `Error::NotLeader`, if it isn't the stream leader. - -### Architecture - -![single write journal architecture](../images/single-write-journal-architecture.svg) - -A `SingleWriteJournal` consists of a master, journal orchestrator, a set of journal servers and a set of journal clients(which are parts of engine). - -The journal server provides the durability of events of journal. All events are produced by journal client. At the same time, only one journal client could produce events which will be accepted by journal servers. That one is called leader, the other journal clients are followers. - -The master is responsible for electing a new leader and detecting the leader's live. The master is also responsible for providing routers and balancing loads among journal servers. To scale the set of journal servers on-demand, the master provisions or de-provisions journal servers from the orchestrator. - -#### Electing and Fault detecting - -The master collects status and stats from both journal client and server periodic via heartbeat RPC requests. If the master haven't received heartbeats from the current leader after a while, it would choose a new client as the new leader, a set of servers as replication group, and assign a monotonic epoch to the new leader. - -The order of events in a replication group is decided by the leader. In order to ensure the consistency of orders of events, before committing any events, a leader should ensure there no any events which produced by former leader would be accepted by journal servers. Via sealing RPC requests, the new master requires all journal servers don't accepts any events with small epoch. - -#### Replication Policy - -A event will be replicated to all journal servers of a replication group eventually. Once an event is replicated to enough journal servers, which is specified by the replication policy, the event could be committed and applied to engine. - -#### Reconfiguration - -In general, a replication policy allows a journal server downtime unexpectedly, but does not affects the writing operations. In order to keep the availability in this situation, master would enforce leader to seal previous events and allocate new epoch so that it could change the configuration such as replication group to remove the faulted nodes. - -#### Follower read - -A leader will broadcast the committed sequence of events to all journal server, and those events is visible for reading. But here exists a gap between an event become committed in leader and an event is readable in a journal server. So a follower want to read events with consistency, it should ask the latest committed sequence from leader and wait until it receive those events. - -### Future works - -#### Chain replication - -A leader might be the bottleneck, since it is responsible for replicating events to all journal servers. We could employ the chain replication mechanism that allow journal servers replicating events to other servers. Specially, a leader could use chain replication to replicate events to all followers. - -#### Archive - -After a series of events are sealed, those events could be put into s3 to reduce usage of local disk. Specially, user could manually archive some events to a cheap stores. From 3a564dfb2919dfccd3ff480fe00291f800559c40 Mon Sep 17 00:00:00 2001 From: w41ter Date: Sat, 15 Jan 2022 21:15:53 +0800 Subject: [PATCH 5/9] add description about epoch --- docs/rfcs/20220110-leader-based-journal.md | 25 +++++++++++++++------- 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/docs/rfcs/20220110-leader-based-journal.md b/docs/rfcs/20220110-leader-based-journal.md index b153432c..8024107b 100644 --- a/docs/rfcs/20220110-leader-based-journal.md +++ b/docs/rfcs/20220110-leader-based-journal.md @@ -8,7 +8,7 @@ In this RPC, we present a trait `LeaderBasedJournal`, which divides the users of `Journal` into two roles: a leader who could write, and followers, who only have read permission. In the same time, this trait provides a means of observing role transition. -## Motivation +## Motivation The luna engine requires a leader to execute mutations, such as journal writing, flushing memory tables into persisted storage, as well as a group of followers who subscribe journal streams and reply mutations, to remain consistent with the engine leader. @@ -23,19 +23,28 @@ Furthermore, the new abstraction must be compatible with the existing journal ab Here is the API design of leader-based journal. ```rust -pub enum RoleState { +pub enum Role { Leader, Follower, } +pub trait EpochState { + fn epoch(&self) -> u64; + + // The role of associated stream. + fn role(&self) -> Role; + + // The leader of the associated stream. + fn leader(&self) -> Option; +} + pub trait LeaderBasedJournal : Journal { - type Role; - type Peer; - type StateStream: Stream; + type EpochState: EpochState; + type StateStream: Stream; - fn state(&self, name: &str) -> (Self::Role, Option); + fn state(&self, name: &str) -> Result; - async fn observe_state(&self, name: &str) -> Self::StateStream; + async fn observe_state(&self, name: &str) -> Result; } ``` @@ -43,7 +52,7 @@ The `LeaderJournal` doesn't affects the semantics of `Journal`, so `Journal::ope The `LeaderBasedJournal` will forwards the electing progress automatically, which the engine won't have to recognize it. However, the engine must initiate that automatic progress manually, because a journal might contains multiple streams, which could exceeds the hardware limitation if we monitors all stream's electing progress. As a result, just streams that the engine is interested in will be watched. -When the engine calls `LeaderBasedJournal::observe_state`, the `LeaderBasedJournal` starts monitoring and subscribing to the electing state transition. It will yield a `Stream` that will be fired whenever one of the electing states changes. +When the engine calls `LeaderBasedJournal::observe_state`, the `LeaderBasedJournal` starts monitoring and subscribing to the electing state transition. It will yield a `Stream` that will be fired whenever one of the electing states changes. We utilize the epoch to track state changes. Time is divided into epochs of arbitrary length, the `LeaderBasedJournal` must ensure that each epoch has only one leader. When a leader engine crashes, another machine's `LeaderBasedJournal` instance is elected as the new leader and begins to recover, eventually providing service. From fefe9845d3447bfa88aa96c8fbc332abd7e9523d Mon Sep 17 00:00:00 2001 From: w41ter Date: Sat, 15 Jan 2022 21:29:57 +0800 Subject: [PATCH 6/9] remove useless file --- docs/images/single-write-journal-architecture.svg | 1 - 1 file changed, 1 deletion(-) delete mode 100644 docs/images/single-write-journal-architecture.svg diff --git a/docs/images/single-write-journal-architecture.svg b/docs/images/single-write-journal-architecture.svg deleted file mode 100644 index b8f36e0a..00000000 --- a/docs/images/single-write-journal-architecture.svg +++ /dev/null @@ -1 +0,0 @@ -MasterOrchestratorJournal ServerJournal ServerJournal ServerJournal Client (L)Journal Client (F)Journal Serverprovisionde-provisionPullWriteWriteWriteRead \ No newline at end of file From b7257e25a0fe995c453b232f8c428483d6167a86 Mon Sep 17 00:00:00 2001 From: w41ter Date: Sat, 15 Jan 2022 21:54:14 +0800 Subject: [PATCH 7/9] address comments --- docs/rfcs/20220110-leader-based-journal.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/docs/rfcs/20220110-leader-based-journal.md b/docs/rfcs/20220110-leader-based-journal.md index 8024107b..b08361b8 100644 --- a/docs/rfcs/20220110-leader-based-journal.md +++ b/docs/rfcs/20220110-leader-based-journal.md @@ -39,10 +39,9 @@ pub trait EpochState { } pub trait LeaderBasedJournal : Journal { - type EpochState: EpochState; - type StateStream: Stream; + type StateStream: Stream>; - fn state(&self, name: &str) -> Result; + fn state(&self, name: &str) -> Result>; async fn observe_state(&self, name: &str) -> Result; } From 0170fde071749a3a8920b892996fad3843143a37 Mon Sep 17 00:00:00 2001 From: w41ter Date: Mon, 17 Jan 2022 15:29:17 +0800 Subject: [PATCH 8/9] address comments --- docs/rfcs/20220110-leader-based-journal.md | 34 +++++++++++++++------- 1 file changed, 24 insertions(+), 10 deletions(-) diff --git a/docs/rfcs/20220110-leader-based-journal.md b/docs/rfcs/20220110-leader-based-journal.md index b08361b8..2453a98c 100644 --- a/docs/rfcs/20220110-leader-based-journal.md +++ b/docs/rfcs/20220110-leader-based-journal.md @@ -6,7 +6,9 @@ ## Summary -In this RPC, we present a trait `LeaderBasedJournal`, which divides the users of `Journal` into two roles: a leader who could write, and followers, who only have read permission. In the same time, this trait provides a means of observing role transition. +In this RFC, we present a trait `LeaderBasedJournal`, which divides the users (eg, luna engine) of `Journal`'s stream into two roles: a leader who could write, and followers, who only have read permission. In the same time, this trait provides a means of observing role transition. + +In this RFC, we only focus the abstraction of API of `LeaderBasedJournal`, it's semantic and constraint. The implementation details, such as the way electing, consistency and durability, will be a subject of a follow-up RFC. ## Motivation @@ -31,28 +33,40 @@ pub enum Role { pub trait EpochState { fn epoch(&self) -> u64; - // The role of associated stream. + /// The role of associated stream. fn role(&self) -> Role; - // The leader of the associated stream. + /// The leader of the associated stream. fn leader(&self) -> Option; } pub trait LeaderBasedJournal : Journal { type StateStream: Stream>; - fn state(&self, name: &str) -> Result>; + /// Get the current state of the stream with `stream_name`. + fn state(&self, stream_name: &str) -> Result>; - async fn observe_state(&self, name: &str) -> Result; + /// Subscribe state updates of the stream with `stream_name`. + fn observe_state(&self, stream_name: &str) -> Result; } ``` -The `LeaderJournal` doesn't affects the semantics of `Journal`, so `Journal::open_stream_writer` could be called whenever a stream isn't a leader. Of course, the implementation should guarantee that calls `StreamWriter::append` or other modifying operations will got a `Error::NotLeader`, if it isn't the stream leader. +### Compatible + +The `LeaderBasedJournal` doesn't affects the semantics of `Journal`, so `Journal::open_stream_writer` could be called whenever a stream isn't a leader. Of course, the implementation should guarantee that calls `StreamWriter::append` or other modifying operations will got a `Error::NotLeader`, if it isn't the stream leader. + +### Concurrent + +The `LeaderBasedJournal` allows the user to open multiple `StreamWriter`, while each `StreamWriter` is valid. This feature can be used to implement concurrent writes. The `Journal::open_stream_writer` could be invoked to get a `StreamWriter` for each threads, if the user needs to call `StreamWriter::append` on the same stream in multiple threads. + +### Electing & States + +The `LeaderBasedJournal` will forwards the electing progress automatically, which the engine won't have to recognize it. Once we start the cluster, an instance of `LeaderBasedJournal` will be selected as the leader of a stream, and provides service. When a leader engine crashes, another machine's `LeaderBasedJournal` instance is elected as the new leader and begins to recover, eventually providing service. -The `LeaderBasedJournal` will forwards the electing progress automatically, which the engine won't have to recognize it. However, the engine must initiate that automatic progress manually, because a journal might contains multiple streams, which could exceeds the hardware limitation if we monitors all stream's electing progress. As a result, just streams that the engine is interested in will be watched. +We referred to the leadership and other elected-related facts as `state`. When a new leader is elected or other facts is updated, the `state` changes(in some implementation, it could indicate configuration or copy-set is changed). To track `state` changes, we use the term `epoch`, which is a monotonically growing number. Time is divided into `epoch`s of arbitrary length, and the `LeaderBasedJournal` must ensure that each `epoch` has only one leader. The trait `EpochState` is used to provide both `state` and `epoch`. The current `state` and `epoch` could be obtained by invoking `LeaderBasedJournal::state`. -When the engine calls `LeaderBasedJournal::observe_state`, the `LeaderBasedJournal` starts monitoring and subscribing to the electing state transition. It will yield a `Stream` that will be fired whenever one of the electing states changes. We utilize the epoch to track state changes. Time is divided into epochs of arbitrary length, the `LeaderBasedJournal` must ensure that each epoch has only one leader. +Although the electing progress is automatically and engine won't aware of it, but we required that engine must initiate that automatic progress manually. Because a journal might contains multiple streams, which could exceeds the hardware limitation if we monitors all stream's electing progress. As a result, just streams that the engine is interested in will be watched. This is triggered by invoking `LeaderBasedJournal::observe_state` with the specified stream name. When the engine calls `LeaderBasedJournal::observe_state`, the `LeaderBasedJournal` starts monitoring and subscribing to the electing state transition. It will yield a `Stream` that will be fired whenever one of the electing states changes. -When a leader engine crashes, another machine's `LeaderBasedJournal` instance is elected as the new leader and begins to recover, eventually providing service. +### Freshness -We can't ensure that the state returned by the `observe_state` or `state` methods is always fresh in a distributed system, but any write operations will identify this circumstance. As a result, every decision made before submitting should trigger any write operations to check for freshness. +We can't ensure that the state returned by the `observe_state` or `state` methods is always fresh in a distributed system, but any write operations will identify a staled leadership. As a result, every decision made before submitting should trigger any write operations to check for freshness. From 433ac8034944128aee6fdd172271a848eafeb64d Mon Sep 17 00:00:00 2001 From: w41ter Date: Mon, 17 Jan 2022 15:43:26 +0800 Subject: [PATCH 9/9] address comments --- docs/rfcs/20220110-leader-based-journal.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/rfcs/20220110-leader-based-journal.md b/docs/rfcs/20220110-leader-based-journal.md index 2453a98c..54fc5e7f 100644 --- a/docs/rfcs/20220110-leader-based-journal.md +++ b/docs/rfcs/20220110-leader-based-journal.md @@ -61,7 +61,7 @@ The `LeaderBasedJournal` allows the user to open multiple `StreamWriter`, while ### Electing & States -The `LeaderBasedJournal` will forwards the electing progress automatically, which the engine won't have to recognize it. Once we start the cluster, an instance of `LeaderBasedJournal` will be selected as the leader of a stream, and provides service. When a leader engine crashes, another machine's `LeaderBasedJournal` instance is elected as the new leader and begins to recover, eventually providing service. +The `LeaderBasedJournal` will forwards the electing progress automatically, which the engine won't have to recognize it. Once we start the cluster, an instance of `LeaderBasedJournal` will be selected as the leader of a stream, and provides service. When a leader engine crashes, another machine's `LeaderBasedJournal` instance is elected as the new leader and begins to recover, eventually providing service. The steps must to do in electing and recovering is defined by the implementation. We referred to the leadership and other elected-related facts as `state`. When a new leader is elected or other facts is updated, the `state` changes(in some implementation, it could indicate configuration or copy-set is changed). To track `state` changes, we use the term `epoch`, which is a monotonically growing number. Time is divided into `epoch`s of arbitrary length, and the `LeaderBasedJournal` must ensure that each `epoch` has only one leader. The trait `EpochState` is used to provide both `state` and `epoch`. The current `state` and `epoch` could be obtained by invoking `LeaderBasedJournal::state`.