-
Notifications
You must be signed in to change notification settings - Fork 65
Conversation
Confirm another question~ (storage seems also have the same question Now, we have 2 engine nodes (L1 and F2), and we add 2 engine Nodes(F3, F4) to the cluster, how do Or where is suitable to maintain metadata like Orchestrator could give a full list of L1, F2, F3, F4, but it doesn't know F3 no need to replicate... |
@zojw Now we only need to implement single writer journal, it means that only one engine is writer, the others are readers. But once partition is added to engine, we should add a new node to manage the destribution of partitions. |
@w41ter-l @huachaohuang I noticed that we're implementing the API designed here. If the RFC is accepted, please update the status and pull request links, start a tracking issue for the work (added in the RFC) and merge this PR. Otherwise, the RFC process is fake. |
- Status: draft | ||
- Pull Request: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Status: draft | |
- Pull Request: | |
- Status: accepted | |
- Pull Request: https://github.com/engula/engula/pull/280 | |
- Tracking Issue: <please create one of this> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
#### Follower read | ||
|
||
A leader will broadcast the committed sequence of events to all journal server, and those events is visible for reading. But here exists a gap between an event become committed in leader and an event is readable in a journal server. So a follower want to read events with consistency, it should ask the latest committed sequence from leader and wait until it receive those events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who is expected to perform follower read?
If it happens only during recovery, duplicate followers seems overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An engine follower must read the stream for the duration of its life to keep track of the leader's state. And no follower read is performed during recovery, because the recovery may execute in leader,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your explanation :)
I think this RFC proposes two things: a new |
Follower, | ||
} | ||
|
||
pub trait SingleWriteJournal : Journal { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we have two focuses:
- For persisting data into journal server cluster, the proposal is using a quorum-based algorithm as its replication policy.
- For supporting high availability via standby engine (journal client), the proposal is electing a leader for all engines.
If so, I suggest that we separate these two focuses and resolve them one after the other.
QuorumBasedJournal
: a client side implementation that dealing with persisting data into journal server cluster.- ...
LunaEngine
(or its internal JournalWriter, concretely) has a leader election mechanism for determining which instance is the exclusive writer of current shard, with a monotonically increasing token. When a new leader elected, it writes its token to the journal server cluster (quorum).FencedJournal
: a server side journal implementation that accepted monotonically increasing token. It only serves requests configured with the current token and accepts monotonically increasing only token updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit over design, at least for use cases similar to the luna engine. The semantic of LeaderBasedJournal
is easier to use and the previous design in this RFC naturally maps to the API. It is OK to decouple the internal implementation to some reusable components like QuorumBasedJournal
+ FencedJournal
, but that depends on the implementation.
I have divided this RPC into two parts. This part mainly focuses on the design of the leader based journal, the design of the shared journal will be submitted with a new RPC later. |
@@ -0,0 +1 @@ | |||
<svg id="SvgjsSvg1006" width="649" height="381" xmlns="http://www.w3.org/2000/svg" version="1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:svgjs="http://svgjs.com/svgjs"><defs id="SvgjsDefs1007"><marker id="SvgjsMarker1026" markerWidth="12" markerHeight="8" refX="9" refY="4" viewBox="0 0 12 8" orient="auto" markerUnits="userSpaceOnUse" stroke-dasharray="0,0"><path id="SvgjsPath1027" d="M0,0 L12,4 L0,8 L0,0" fill="#323232" stroke="#323232" stroke-width="1"></path></marker><marker id="SvgjsMarker1066" markerWidth="12" markerHeight="8" refX="9" refY="4" viewBox="0 0 12 8" orient="auto" markerUnits="userSpaceOnUse" stroke-dasharray="0,0"><path id="SvgjsPath1067" d="M0,0 L12,4 L0,8 L0,0" fill="#323232" stroke="#323232" stroke-width="1"></path></marker><marker id="SvgjsMarker1074" markerWidth="12" markerHeight="8" refX="9" refY="4" viewBox="0 0 12 8" orient="auto" markerUnits="userSpaceOnUse" stroke-dasharray="0,0"><path id="SvgjsPath1075" d="M0,0 L12,4 L0,8 L0,0" fill="#323232" stroke="#323232" stroke-width="1"></path></marker><marker id="SvgjsMarker1082" markerWidth="12" markerHeight="8" refX="9" refY="4" viewBox="0 0 12 8" orient="auto" markerUnits="userSpaceOnUse" stroke-dasharray="0,0"><path id="SvgjsPath1083" d="M0,0 L12,4 L0,8 L0,0" fill="#323232" stroke="#323232" stroke-width="1"></path></marker><marker id="SvgjsMarker1086" markerWidth="12" markerHeight="8" refX="9" refY="4" viewBox="0 0 12 8" orient="auto" markerUnits="userSpaceOnUse" stroke-dasharray="0,0"><path id="SvgjsPath1087" d="M0,0 L12,4 L0,8 L0,0" fill="#323232" stroke="#323232" stroke-width="1"></path></marker><marker id="SvgjsMarker1090" markerWidth="12" markerHeight="8" refX="9" refY="4" viewBox="0 0 12 8" orient="auto" markerUnits="userSpaceOnUse" stroke-dasharray="0,0"><path id="SvgjsPath1091" d="M0,0 L12,4 L0,8 L0,0" fill="#323232" stroke="#323232" stroke-width="1"></path></marker><marker id="SvgjsMarker1094" markerWidth="12" markerHeight="8" refX="9" refY="4" viewBox="0 0 12 8" orient="auto" markerUnits="userSpaceOnUse" stroke-dasharray="0,0"><path id="SvgjsPath1095" d="M0,0 L12,4 L0,8 L0,0" fill="#323232" stroke="#323232" stroke-width="1"></path></marker><marker id="SvgjsMarker1098" markerWidth="12" markerHeight="8" refX="9" refY="4" viewBox="0 0 12 8" orient="auto" markerUnits="userSpaceOnUse" stroke-dasharray="0,0"><path id="SvgjsPath1099" d="M0,0 L12,4 L0,8 L0,0" fill="#323232" stroke="#323232" stroke-width="1"></path></marker><marker id="SvgjsMarker1106" markerWidth="12" markerHeight="8" refX="9" refY="4" viewBox="0 0 12 8" orient="auto" markerUnits="userSpaceOnUse" stroke-dasharray="0,0"><path id="SvgjsPath1107" d="M0,0 L12,4 L0,8 L0,0" fill="#323232" stroke="#323232" stroke-width="1"></path></marker><marker id="SvgjsMarker1114" markerWidth="12" markerHeight="8" refX="9" refY="4" viewBox="0 0 12 8" orient="auto" markerUnits="userSpaceOnUse" stroke-dasharray="0,0"><path id="SvgjsPath1115" d="M0,0 L12,4 L0,8 L0,0" fill="#323232" stroke="#323232" stroke-width="1"></path></marker><marker id="SvgjsMarker1122" markerWidth="12" markerHeight="8" refX="9" refY="4" viewBox="0 0 12 8" orient="auto" markerUnits="userSpaceOnUse" stroke-dasharray="0,0"><path id="SvgjsPath1123" d="M0,0 L12,4 L0,8 L0,0" fill="#323232" stroke="#323232" stroke-width="1"></path></marker><marker id="SvgjsMarker1130" markerWidth="12" markerHeight="8" refX="9" refY="4" viewBox="0 0 12 8" orient="auto" markerUnits="userSpaceOnUse" stroke-dasharray="0,0"><path id="SvgjsPath1131" d="M0,0 L12,4 L0,8 L0,0" fill="#323232" stroke="#323232" stroke-width="1"></path></marker><marker id="SvgjsMarker1134" markerWidth="12" markerHeight="8" refX="9" refY="4" viewBox="0 0 12 8" orient="auto" markerUnits="userSpaceOnUse" stroke-dasharray="0,0"><path id="SvgjsPath1135" d="M0,0 L12,4 L0,8 L0,0" fill="#323232" stroke="#323232" stroke-width="1"></path></marker></defs><g id="SvgjsG1008" transform="translate(24.999984741210938,25.003982543945312)"><path id="SvgjsPath1009" d="M 0 0L 599 0L 599 331.2999954223633L 0 331.2999954223633Z" stroke="rgba(255,255,255,1)" stroke-width="2" fill-opacity="1" fill="#ffffff"></path><g id="SvgjsG1010"><text id="SvgjsText1011" font-family="微软雅黑" text-anchor="middle" font-size="13px" width="579px" fill="#323232" font-weight="400" align="middle" lineHeight="125%" anchor="middle" family="微软雅黑" size="13px" weight="400" font-style="" opacity="1" y="156.02499771118164" transform="rotate(0)"></text></g></g><g id="SvgjsG1012" transform="translate(238.99998474121094,38.50398254394531)"><path id="SvgjsPath1013" d="M 0 0L 96 0L 96 37L 0 37Z" stroke-dasharray="10,6" stroke="rgba(50,50,50,1)" stroke-width="1" fill-opacity="1" fill="#ffffff"></path><g id="SvgjsG1014"><text id="SvgjsText1015" font-family="微软雅黑" text-anchor="middle" font-size="13px" width="76px" fill="#323232" font-weight="400" align="middle" lineHeight="125%" anchor="middle" family="微软雅黑" size="13px" weight="400" font-style="" opacity="1" y="8.875" transform="rotate(0)"><tspan id="SvgjsTspan1016" dy="16" x="48"><tspan id="SvgjsTspan1017" style="text-decoration:;">Master</tspan></tspan></text></g></g><g id="SvgjsG1018" transform="translate(490.99998474121094,38.50398254394531)"><path id="SvgjsPath1019" d="M 0 0L 96 0L 96 37L 0 37Z" stroke="rgba(50,50,50,1)" stroke-width="1" fill-opacity="1" fill="#ffffff"></path><g id="SvgjsG1020"><text id="SvgjsText1021" font-family="微软雅黑" text-anchor="middle" font-size="13px" width="76px" fill="#323232" font-weight="400" align="middle" lineHeight="125%" anchor="middle" family="微软雅黑" size="13px" weight="400" font-style="" opacity="1" y="8.875" transform="rotate(0)"><tspan id="SvgjsTspan1022" dy="16" x="48"><tspan id="SvgjsTspan1023" style="text-decoration:;">Orchestrator</tspan></tspan></text></g></g><g id="SvgjsG1024"><path id="SvgjsPath1025" d="M335.49998474121094 57.00398254394531L412.99998474121094 57.00398254394531L412.99998474121094 57.00398254394531L489.1999847412109 57.00398254394531" stroke-dasharray="3,3" stroke="#323232" stroke-width="1" fill="none" marker-end="url(#SvgjsMarker1026)"></path></g><g id="SvgjsG1028" transform="translate(327.99998474121094,128.5039825439453)"><path id="SvgjsPath1029" d="M 0 0L 96 0L 96 37L 0 37Z" stroke="rgba(50,50,50,1)" stroke-width="1" fill-opacity="1" fill="#ffffff"></path><g id="SvgjsG1030"><text id="SvgjsText1031" font-family="微软雅黑" text-anchor="middle" font-size="13px" width="76px" fill="#323232" font-weight="400" align="middle" lineHeight="125%" anchor="middle" family="微软雅黑" size="13px" weight="400" font-style="" opacity="1" y="8.875" transform="rotate(0)"><tspan id="SvgjsTspan1032" dy="16" x="48"><tspan id="SvgjsTspan1033" style="text-decoration:;">Journal Server</tspan></tspan></text></g></g><g id="SvgjsG1034" transform="translate(327.99998474121094,205.5039825439453)"><path id="SvgjsPath1035" d="M 0 0L 96 0L 96 37L 0 37Z" stroke="rgba(50,50,50,1)" stroke-width="1" fill-opacity="1" fill="#ffffff"></path><g id="SvgjsG1036"><text id="SvgjsText1037" font-family="微软雅黑" text-anchor="middle" font-size="13px" width="76px" fill="#323232" font-weight="400" align="middle" lineHeight="125%" anchor="middle" family="微软雅黑" size="13px" weight="400" font-style="" opacity="1" y="8.875" transform="rotate(0)"><tspan id="SvgjsTspan1038" dy="16" x="48"><tspan id="SvgjsTspan1039" style="text-decoration:;">Journal Server</tspan></tspan></text></g></g><g id="SvgjsG1040" transform="translate(327.99998474121094,282.5039825439453)"><path id="SvgjsPath1041" d="M 0 0L 96 0L 96 37L 0 37Z" stroke="rgba(50,50,50,1)" stroke-width="1" fill-opacity="1" fill="#ffffff"></path><g id="SvgjsG1042"><text id="SvgjsText1043" font-family="微软雅黑" text-anchor="middle" font-size="13px" width="76px" fill="#323232" font-weight="400" align="middle" lineHeight="125%" anchor="middle" family="微软雅黑" size="13px" weight="400" font-style="" opacity="1" y="8.875" transform="rotate(0)"><tspan id="SvgjsTspan1044" dy="16" x="48"><tspan id="SvgjsTspan1045" style="text-decoration:;">Journal Server</tspan></tspan></text></g></g><g id="SvgjsG1046" transform="translate(52.99998474121094,160.13598251342773)"><path id="SvgjsPath1047" d="M 0 0L 120 0L 120 36.36800003051758L 0 36.36800003051758Z" stroke="rgba(50,50,50,1)" stroke-width="1" fill-opacity="1" fill="#ffffff"></path><g id="SvgjsG1048"><text id="SvgjsText1049" font-family="微软雅黑" text-anchor="middle" font-size="13px" width="100px" fill="#323232" font-weight="400" align="middle" lineHeight="125%" anchor="middle" family="微软雅黑" size="13px" weight="400" font-style="" opacity="1" y="8.559000015258789" transform="rotate(0)"><tspan id="SvgjsTspan1050" dy="16" x="60"><tspan id="SvgjsTspan1051" style="text-decoration:;">Journal Client (L)</tspan></tspan></text></g></g><g id="SvgjsG1052" transform="translate(52.99998474121094,240.13598251342773)"><path id="SvgjsPath1053" d="M 0 0L 120 0L 120 39.36800003051758L 0 39.36800003051758Z" stroke="rgba(50,50,50,1)" stroke-width="1" fill-opacity="1" fill="#ffffff"></path><g id="SvgjsG1054"><text id="SvgjsText1055" font-family="微软雅黑" text-anchor="middle" font-size="13px" width="100px" fill="#323232" font-weight="400" align="middle" lineHeight="125%" anchor="middle" family="微软雅黑" size="13px" weight="400" font-style="" opacity="1" y="10.059000015258789" transform="rotate(0)"><tspan id="SvgjsTspan1056" dy="16" x="60"><tspan id="SvgjsTspan1057" style="text-decoration:;">Journal Client (F)</tspan></tspan></text></g></g><g id="SvgjsG1058" transform="translate(490.99998474121094,148.5039825439453)"><path id="SvgjsPath1059" d="M 0 0L 96 0L 96 37L 0 37Z" stroke-dasharray="10,6" stroke="rgba(50,50,50,1)" stroke-width="1" fill-opacity="1" fill="#ffffff"></path><g id="SvgjsG1060"><text id="SvgjsText1061" font-family="微软雅黑" text-anchor="middle" font-size="13px" width="76px" fill="#323232" font-weight="400" align="middle" lineHeight="125%" anchor="middle" family="微软雅黑" size="13px" weight="400" font-style="" opacity="1" y="8.875" transform="rotate(0)"><tspan id="SvgjsTspan1062" dy="16" x="48"><tspan id="SvgjsTspan1063" style="text-decoration:;">Journal Server</tspan></tspan></text></g></g><g id="SvgjsG1064"><path id="SvgjsPath1065" d="M504.99998474121094 75.25198364257812L504.99998474121094 111.75198364257812L504.99998474121094 111.75198364257812L504.99998474121094 146.9519836425781" stroke-dasharray="3,3" stroke="#323232" stroke-width="1" fill="none" marker-end="url(#SvgjsMarker1066)"></path><rect id="SvgjsRect1068" width="49" height="16" x="480.49998474121094" y="103.10198364257812" fill="#ffffff"></rect><text id="SvgjsText1069" font-family="微软雅黑" text-anchor="middle" font-size="13px" width="49px" fill="#323232" font-weight="400" align="top" lineHeight="16px" anchor="middle" family="微软雅黑" size="13px" weight="400" font-style="" opacity="1" y="101.47698364257812" transform="rotate(0)"><tspan id="SvgjsTspan1070" dy="16" x="504.99998474121094"><tspan id="SvgjsTspan1071" style="text-decoration:;">provision</tspan></tspan></text></g><g id="SvgjsG1072"><path id="SvgjsPath1073" d="M575.9999847412109 148.25198364257812L575.9999847412109 111.75198364257812L575.9999847412109 111.75198364257812L575.9999847412109 76.55198364257812" stroke-dasharray="3,3" stroke="#323232" stroke-width="1" fill="none" marker-end="url(#SvgjsMarker1074)"></path><rect id="SvgjsRect1076" width="66" height="16" x="542.9999847412109" y="104.40198364257813" fill="#ffffff"></rect><text id="SvgjsText1077" font-family="微软雅黑" text-anchor="middle" font-size="13px" width="66px" fill="#323232" font-weight="400" align="top" lineHeight="16px" anchor="middle" family="微软雅黑" size="13px" weight="400" font-style="" opacity="1" y="102.77698364257813" transform="rotate(0)"><tspan id="SvgjsTspan1078" dy="16" x="575.9999847412109"><tspan id="SvgjsTspan1079" style="text-decoration:;">de-provision</tspan></tspan></text></g><g id="SvgjsG1080"><path id="SvgjsPath1081" d="M424.49998474121094 224.0039825439453L453.99998474121094 224.0039825439453L453.99998474121094 105.50398254394531L286.99998474121094 105.50398254394531L286.99998474121094 77.30398254394531" stroke-dasharray="8,5" stroke="#323232" stroke-width="1" fill="none" marker-end="url(#SvgjsMarker1082)"></path></g><g id="SvgjsG1084"><path id="SvgjsPath1085" d="M424.49998474121094 301.0039825439453L453.99998474121094 301.0039825439453L453.99998474121094 105.50398254394531L286.99998474121094 105.50398254394531L286.99998474121094 77.30398254394531" stroke-dasharray="8,5" stroke="#323232" stroke-width="1" fill="none" marker-end="url(#SvgjsMarker1086)"></path></g><g id="SvgjsG1088"><path id="SvgjsPath1089" d="M173.49998474121094 178.31998252868652L216.99998474121094 178.31998252868652L216.99998474121094 57.00398254394531L237.19998474121093 57.00398254394531" stroke-dasharray="8,5" stroke="#323232" stroke-width="1" fill="none" marker-end="url(#SvgjsMarker1090)"></path></g><g id="SvgjsG1092"><path id="SvgjsPath1093" d="M173.49998474121094 259.8199825286865L218.99998474121094 259.8199825286865L218.99998474121094 57.00398254394531L237.19998474121093 57.00398254394531" stroke-dasharray="8,5" stroke="#323232" stroke-width="1" fill="none" marker-end="url(#SvgjsMarker1094)"></path></g><g id="SvgjsG1096"><path id="SvgjsPath1097" d="M424.49998474121094 147.0039825439453L453.99998474121094 147.0039825439453L453.99998474121094 106.00398254394531L286.99998474121094 106.00398254394531L286.99998474121094 77.30398254394531" stroke-dasharray="8,5" stroke="#323232" stroke-width="1" fill="none" marker-end="url(#SvgjsMarker1098)"></path><rect id="SvgjsRect1100" width="21" height="16" x="380.899984741211" y="98.00398254394531" fill="#ffffff"></rect><text id="SvgjsText1101" font-family="微软雅黑" text-anchor="middle" font-size="13px" width="21px" fill="#323232" font-weight="400" align="top" lineHeight="16px" anchor="middle" family="微软雅黑" size="13px" weight="400" font-style="" opacity="1" y="96.37898254394531" transform="rotate(0)"><tspan id="SvgjsTspan1102" dy="16" x="391.399984741211"><tspan id="SvgjsTspan1103" style="text-decoration:;">Pull</tspan></tspan></text></g><g id="SvgjsG1104"><path id="SvgjsPath1105" d="M173.4996402464354 178.30142515551884C 236.25273722524983 178.31998252868652 264.7472322571721 147.0039825439453 326.20122492240284 147.07078908734897" stroke="#323232" stroke-width="1" fill="none" marker-end="url(#SvgjsMarker1106)"></path><rect id="SvgjsRect1108" width="29" height="16" x="235.837596702013" y="154.6680136825954" fill="#ffffff"></rect><text id="SvgjsText1109" font-family="微软雅黑" text-anchor="middle" font-size="13px" width="29px" fill="#323232" font-weight="400" align="top" lineHeight="16px" anchor="middle" family="微软雅黑" size="13px" weight="400" font-style="" opacity="1" y="153.0430136825954" transform="rotate(0)"><tspan id="SvgjsTspan1110" dy="16" x="250.337596702013"><tspan id="SvgjsTspan1111" style="text-decoration:;">Write</tspan></tspan></text></g><g id="SvgjsG1112"><path id="SvgjsPath1113" d="M173.4993074767335 178.34599802003825C 237.63685082538203 178.31998252868652 263.3631186570399 224.0039825439453 326.20242289332975 223.9103267750791" stroke="#323232" stroke-width="1" fill="none" marker-end="url(#SvgjsMarker1114)"></path><rect id="SvgjsRect1116" width="29" height="16" x="235.8377048521661" y="193.1535275016266" fill="#ffffff"></rect><text id="SvgjsText1117" font-family="微软雅黑" text-anchor="middle" font-size="13px" width="29px" fill="#323232" font-weight="400" align="top" lineHeight="16px" anchor="middle" family="微软雅黑" size="13px" weight="400" font-style="" opacity="1" y="191.5285275016266" transform="rotate(0)"><tspan id="SvgjsTspan1118" dy="16" x="250.3377048521661"><tspan id="SvgjsTspan1119" style="text-decoration:;">Write</tspan></tspan></text></g><g id="SvgjsG1120"><path id="SvgjsPath1121" d="M173.49768695360865 178.36786266835227C 252.07095424050976 178.31998252868652 248.9290152419121 301.0039825439453 326.20825677657916 300.8316140411486" stroke="#323232" stroke-width="1" fill="none" marker-end="url(#SvgjsMarker1122)"></path><rect id="SvgjsRect1124" width="29" height="16" x="235.83823152218167" y="231.64642149092455" fill="#ffffff"></rect><text id="SvgjsText1125" font-family="微软雅黑" text-anchor="middle" font-size="13px" width="29px" fill="#323232" font-weight="400" align="top" lineHeight="16px" anchor="middle" family="微软雅黑" size="13px" weight="400" font-style="" opacity="1" y="230.02142149092455" transform="rotate(0)"><tspan id="SvgjsTspan1126" dy="16" x="250.33823152218167"><tspan id="SvgjsTspan1127" style="text-decoration:;">Write</tspan></tspan></text></g><g id="SvgjsG1128"><path id="SvgjsPath1129" d="M375.99998474121094 243.0039825439453L375.99998474121094 259.8199825286865L174.79998474121095 259.8199825286865" stroke="#323232" stroke-width="1" fill="none" marker-end="url(#SvgjsMarker1130)"></path></g><g id="SvgjsG1132"><path id="SvgjsPath1133" d="M375.99998474121094 282.0039825439453L375.99998474121094 259.8199825286865L174.79998474121095 259.8199825286865" stroke="#323232" stroke-width="1" fill="none" marker-end="url(#SvgjsMarker1134)"></path><rect id="SvgjsRect1136" width="27" height="16" x="272.9919847488403" y="251.81998252868652" fill="#ffffff"></rect><text id="SvgjsText1137" font-family="微软雅黑" text-anchor="middle" font-size="13px" width="27px" fill="#323232" font-weight="400" align="top" lineHeight="16px" anchor="middle" family="微软雅黑" size="13px" weight="400" font-style="" opacity="1" y="250.19498252868652" transform="rotate(0)"><tspan id="SvgjsTspan1138" dy="16" x="286.4919847488403"><tspan id="SvgjsTspan1139" style="text-decoration:;">Read</tspan></tspan></text></g></svg> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is unused in this document.
pub trait LeaderBasedJournal : Journal { | ||
type Role; | ||
type Peer; | ||
type StateStream: Stream<Item = RoleState>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need some API like wait_next
here so that the caller can wait for the next event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as it's a stream of state/event, an extension like next()
can achieve this out-of-the-box.
@w41ter-l Is "state" just "event" or something different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tisonkun I've added some details, maybe able to answer your question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huachaohuang It seems observe_state
is enough to wait for the next event, are there details that I haven't considered?
|
||
fn state(&self, name: &str) -> (Self::Role, Option<Self::Peer>); | ||
|
||
async fn observe_state(&self, name: &str) -> Self::StateStream; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe watch_state
is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There doesn't seem to be a difference between the two, is there something I haven't noticed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nits inline.
The `LeaderJournal` doesn't affects the semantics of `Journal`, so `Journal::open_stream_writer` could be called whenever a stream isn't a leader. Of course, the implementation should guarantee that calls `StreamWriter::append` or other modifying operations will got a `Error::NotLeader`, if it isn't the stream leader. | ||
|
||
The `LeaderBasedJournal` will forwards the electing progress automatically, which the engine won't have to recognize it. However, the engine must initiate that automatic progress manually, because a journal might contains multiple streams, which could exceeds the hardware limitation if we monitors all stream's electing progress. As a result, just streams that the engine is interested in will be watched. | ||
|
||
When the engine calls `LeaderBasedJournal::observe_state`, the `LeaderBasedJournal` starts monitoring and subscribing to the electing state transition. It will yield a `Stream` that will be fired whenever one of the electing states changes. We utilize the epoch to track state changes. Time is divided into epochs of arbitrary length, the `LeaderBasedJournal` must ensure that each epoch has only one leader. | ||
|
||
When a leader engine crashes, another machine's `LeaderBasedJournal` instance is elected as the new leader and begins to recover, eventually providing service. | ||
|
||
We can't ensure that the state returned by the `observe_state` or `state` methods is always fresh in a distributed system, but any write operations will identify this circumstance. As a result, every decision made before submitting should trigger any write operations to check for freshness. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can discuss the semantic of APIs in sections instead of paragraphs, which separates focuses better.
Also, for significant procedure, such as read/append events, leader state changes, and recovery, you'd better include pseudo code of the procedure or a diagram. For example, in #287 I can see a new concept Phase
that participates the recovery and leader state changes. I guess it's part of significants of this RFC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinks for you advices, I have separated paragraphs into sections.
I think the new concept is the implementation details, so I will introduce it in a follow up RFC.
|
||
## Summary | ||
|
||
In this RPC, we present a trait `LeaderBasedJournal`, which divides the users of `Journal` into two roles: a leader who could write, and followers, who only have read permission. In the same time, this trait provides a means of observing role transition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the following API, role binds to a stream of the journal. Do you intend to elect leader per stream (a.k.a., in stream granularity)?
Here you write leader and followers bind to a journal, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinks, I will fix it.
|
||
fn state(&self, name: &str) -> Result<Box<dyn EpochState>>; | ||
|
||
async fn observe_state(&self, name: &str) -> Result<Self::StateStream>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we handle connection reconnection after a stream is returned?
is the caller's duty to retry or retry logic is done inside the returned stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retry logic should be done in inside.
|
||
The `LeaderBasedJournal` will forwards the electing progress automatically, which the engine won't have to recognize it. However, the engine must initiate that automatic progress manually, because a journal might contains multiple streams, which could exceeds the hardware limitation if we monitors all stream's electing progress. As a result, just streams that the engine is interested in will be watched. | ||
|
||
When the engine calls `LeaderBasedJournal::observe_state`, the `LeaderBasedJournal` starts monitoring and subscribing to the electing state transition. It will yield a `Stream` that will be fired whenever one of the electing states changes. We utilize the epoch to track state changes. Time is divided into epochs of arbitrary length, the `LeaderBasedJournal` must ensure that each epoch has only one leader. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, is the engines called observe_state("stream1")
have a chance to be a follower or leader of stream1~? so for another engine that does not interest those stream1 can be avoid be elect as a follower
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, only called observe_state("stream")
would be a follower or leader of stream.
fn role(&self) -> Role; | ||
|
||
// The leader of the associated stream. | ||
fn leader(&self) -> Option<String>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the content of String
, is it need to be logic server-id or something else~?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea about which is better to fill, so I just left it a String
. it can be changed once we found and defines a properly structs.
The rendered version: leader based journal