-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Rework dispute-coordinator
to use RuntimeInfo
for obtaining session information instead of RollingSessionWindow
#6968
Rework dispute-coordinator
to use RuntimeInfo
for obtaining session information instead of RollingSessionWindow
#6968
Conversation
…it should be an async function
Adjust `dispute-coordinator` initialization to use `RuntimeInfo`
…an be made Remove some fixmes
Rework new session handling code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. We can maximize robustness (left comments) by using different relay_parents in pre-filling and actual use where possible. This way if at least one of them works, we are good, which means maximum robustness against errors. (Pruning, migration problems, ..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Getting rid of the rolling session window demystifies session logic a lot! Much appreciated.
This is not true. runtime-api cache caches session info also only by session index. So you have a valid point, that making the LRU size 6 is not strictly necessary. I think it is a good idea regardless, as we have better control and more importantly a guarantee*) that the last 6 sessions are indeed cached and not already pruned. *) In the absence of errors. |
I haven't noticed that
I agree, I've modified it already. |
If you look at runtime-api cache for session info, it actually ignores relay parent: polkadot/node/core/runtime-api/src/lib.rs Line 137 in ecad912
|
That's true. I think @eskimor mentioned that somewhere but I had forgotten to edit the description. |
Co-authored-by: ordian <[email protected]>
match session_idx { | ||
Ok(session_idx) | ||
if self.last_consecutive_cached_session.is_none() || | ||
session_idx > | ||
self.last_consecutive_cached_session.expect( | ||
"The first clause explicitly handles `None` case. qed.", | ||
) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
match session_idx { | |
Ok(session_idx) | |
if self.last_consecutive_cached_session.is_none() || | |
session_idx > | |
self.last_consecutive_cached_session.expect( | |
"The first clause explicitly handles `None` case. qed.", | |
) => | |
let should_cache_session = |session_idx: &SessionIndex| { | |
self.last_consecutive_cached_session.is_none() || | |
session_idx > | |
&self | |
.last_consecutive_cached_session | |
.expect("The first clause explicitly handles `None` case. qed.") | |
}; | |
match session_idx { | |
Ok(session_idx) if should_cache_session(&session_idx) => { |
Is it more readable to extract the check as a closure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, no :)
match session_idx { | ||
Ok(session_idx) | ||
if self.last_consecutive_cached_session.is_none() || | ||
session_idx > | ||
self.last_consecutive_cached_session.expect( | ||
"The first clause explicitly handles `None` case. qed.", | ||
) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, no :)
gap_in_cache = true; | ||
} | ||
|
||
if !gap_in_cache { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does last_consecutive_cached_session
stand for exactly ? For example if we have this this situation: [ fail ok, ok, ok, ok] ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want the last DISPUTE_WINDOW
sessions cached. IF we have to cache sessions 1..5 and for some reason 3 fails, on next ActiveLeaves update we want to retry fetching it and start caching from 3. So in this case last_consecutive_cached_session
will be set to 2.
Regarding your question: [ fail ok, ok, ok, ok]
. If the failed session is with index X, then last_consecutive_cached_session
will be set to x-1.
I'll do one final burn in next week and merge. |
@tdimitrov can we merge? I will likely have to touch some dispute-coordinator code soon, would be good to have this merged first. |
I wanted to do one more burnin but the new changes are small - I'd say it's safe to merge. |
bot merge |
* master: malus: dont panic on missing validation data (#6952) Offences Migration v1: Removes `ReportsByKindIndex` (#7114) Fix stalling dispute coordinator. (#7125) Fix rolling session window (#7126) [ci] Update buildah command and version (#7128) Bump assigned_slots params (#6991) XCM: Remote account converter (#6662) Rework `dispute-coordinator` to use `RuntimeInfo` for obtaining session information instead of `RollingSessionWindow` (#6968) Revert default proof size back to 64 KB (#7115)
* master: (39 commits) malus: dont panic on missing validation data (#6952) Offences Migration v1: Removes `ReportsByKindIndex` (#7114) Fix stalling dispute coordinator. (#7125) Fix rolling session window (#7126) [ci] Update buildah command and version (#7128) Bump assigned_slots params (#6991) XCM: Remote account converter (#6662) Rework `dispute-coordinator` to use `RuntimeInfo` for obtaining session information instead of `RollingSessionWindow` (#6968) Revert default proof size back to 64 KB (#7115) update rocksdb to 0.20.1 (#7113) Reduce base proof size weight component to zero (#7081) PVF: Move PVF workers into separate crate (#7101) Companion for #13923 (#7111) update safe call filter (#7080) PVF: Don't dispute on missing artifact (#7011) XCM: Properly set the pricing for the DMP router (#6843) pvf: Update docs for PVF artifacts (#6551) Bump syn from 2.0.14 to 2.0.15 (#7093) Companion for substrate#13771 (#6983) Added Dwellir Nigeria bootnodes. (#7097) ...
Part of #6812
The PR contains two notable changes:
RollingSessionInfo
usage fromdispute-coordinator
To use
RuntimeInfo
instead ofRollingSessionWindow
two problems has to be solved:RuntimeInfo
call needs asender
and aparent_hash
for querying the runtime. The first one seems to be always available. The situation with the second one is more complicated.RollingSessionWindow
provides two methods -earliest_sesson
andlatest_session
. These are the first and the last session cached.RuntimeInfo
hasn't got such concept so it needs to be emulated (ordispute-coordinator
to be reworked) somehow.Problem 1 is mainly caused by the way
dispute-coordinator
is initialized. It gets the first active leaf as anOption
and twoVec
s with scraped onchain votes and participations. On further investigation I realised that these parameters are used only on startup and then they are empty. TheVec
s aredrain
-ed and theOption
istake
en. So if the subsytem crashes and gets restarted - it starts up with empty data.For this reason I introduce
struct InitialData
containing all three of them:It is passed to the
dispute-coordinator
viaOption
. So if there is some initialization data (which requires runtime calls) we have got the leaf used to fetch it and we can use it. If there is no initialization data - we are good.Problem 2 is more straightforward.
dispute-coordinator
will keep track of the sessions by itself. I made one simplification here which I'm not sure is correct (see notable change 2).A few words about caching, which I have misunderstood initially.
RuntimeInfo
has its own cache forSessionInfo
(and doesn't solely rely on theruintime-api
subsystem cache for runtime calls as I wrongly assumed). This has got a nice side effect. For example in this call:polkadot/node/subsystem-util/src/runtime/mod.rs
Line 153 in 27ddd27
If the session with index
session_index
is in cache therelay_parent
parameter doesn't matter as no runtime call is made.This is an improvement over theruntime-api
cache where the results for different relay parents are treated as separate keys.TODOs: