-
Notifications
You must be signed in to change notification settings - Fork 999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodes may get stuck due to parent availability issue with transition block #2732
Comments
I don't see why this is true? A CL node would not incorporate this block into their non-optimistic block-tree and would build upon an avaialble TTD block as long as this isn't resolved |
This is true if
IMO, the right behaviour of a node in this edge case is similar to what it would be if the payload execution took forever. Missing data is a different kind of thing and we might want avoid mixing it into this case. I am not saying that this is entirely wrong though -- definitely worth discussion |
Right, SYNCING from EL on chains with an unknown PoW source at the transition cannot block CL from making decisions. Doing so would allow for trivially stopping the transition process.
For simplicity sake (not changing APIs or anything), I think that we should not go into a place where EL can hang forever (just attempting to fill in the unavailable PoW parent) nor should we specify new return values. CL can discern between SYNCING on chains where some PoS ancestor is validated and chains where they are rooted in an unknown PoW parent. Thus they have enough information to act accordingly (not halt block and attestation production). I think the correct thing is to just note that SYNCING return value on transition chains MUST NOT halt block and attestation production. We can discuss other "halt" conditions in optimistic sync discussions/specs elsewhere. |
This distinction by CL at the transition time should resolve and get the right/available/popular pow in the chain by allowing to use the local PoW to build the new merge block. Once the new terminal PoW is out there in network, validators should be able to vote on it. |
I tend to agree. We need a proper place for this statement. I guess it should be in the optimistic sync document.
I've created a separate issue #2735 |
I think we can elevate that as a note into the CL specs
|
What's the limit on that though? For example if I'm following a chain and get a transition block that I can't yet verify, it makes sense to not have that block me. But if I can optimistically follow the chain past that transition block for another 100 epochs then it seems wrong to produce a block that would create a fork. |
An easy limit is if an unavailable chain (wrt PoW source) finalizes. This clearly looks like a networking/EL failure and should probably be bubbled up to the user |
But we'd expect that to happen if we're initially syncing the chain so it wouldn't be an error we'd report to the user. Not performing duties if our finalized checkpoint is only optimistically synced probably does make sense - though I'd be tempted to base it on the justified checkpoint instead since attesting with an invalid justified checkpoint can be very problematic and it's still a very strong signal that a lot of validators think that's the real chain and the execution block should turn up eventually. |
+1 for justified as it will signal early on that User needs to followup/escalate for the social consensus to kickin, hopefully before chain finalizes. |
Addressed in #2770:
A node will not optimistically import a merge transition block until it's safe enough to do so. This prevents system from getting stuck due to all nodes being kicked out by a transition block atop of unavailable terminal block. If a terminal block is available but haven't been disseminated in time it will eventually be disseminated and picked up by one of the next proposers to build another transition block atop of it, and this block will likely be accepted by the network. If a block is indeed unavailable an honest chain built during |
Problem
Suppose, the merge fork has happened and beacon chain network produces blocks with empty payloads waiting for TTD to hit the reality. Malicious proposer builds a block with non-empty payload with a random sequence set to its
parent_hash
field, i.e. producing a payload atop of a block that is unavailable. According to current spec EL must turn intoSYNCING
upon receiving such a payload which would turn all nodes in the network into syncing mode and prevent them from attesting to and proposing new blocks.This edge case was attempted to be solved in the Interop spec (see no 5. in Engine API spec here) and has been brought up in discord by @g11tech (thanks a lot!) as Kintsugi spec seems to miss the handling of this case.
Potential Solution
Interop spec proposes for EL to turn into
SYNCING
only after parent block is pulled from the network and is proved to be a PoS block. If the parent yet not pulled or it appeared to be a PoW block, EL should keep silence and try to pull missing blocks from the wire and execute them. If the parent block is indeed unavailable then EL would try to resolve the dependency forever and never respond to CL, and CL wouldn't treat the beacon block containing this payload as fully validated, thus, would orphan this block and move on. Additionally, it requires EL to properly handle the case when it's forced to sync with unavailable chain -- this assumed to be resolved already as it may happen on the Mainnet (by receivingNewBlock
with unavailable parent) -- but there could be implications in the new context of CL/EL communications.Note, when a node syncs from scratch and EL starts syncing before hearing from CL (regular sync in the PoW network) it will respond with
SYNCING
to anyexecutePayload
call. Suppose CL sendsexecutePayload
with unavailable parent block before EL starts its sync process, EL following the Interop spec would wait until it pulls and executes the parent block and all its ancestors -- this would keep CL in limbo for a few hours in case of the Mainnet. Being in this state CL can't attest to or propose new blocks.A solution that seems working but not always:
isMergeBlock: bool
toexecutePayload
to clearly distinguish transition block from the others (EL could hijackforkchoiceUpdated
-- if there were noforkchoiceUpdated
calls before andexecutePayload
call then this must be a transition block).UNKNOWN_PARENT
response status toexecutePayload
. EL returnsUNKNOWN_PARENT
whenisMergeBlock: True
, the parent is unknown and EL isn't alreadySYNCING
. Additionally, EL initiates the sync process in attempt to sync up to (and including) the parent blockUNKNOWN_PARENT
asSYNCING
during the optimistic sync, and as if there was a missing slot in the case when no sync process is happening -- it allows CL to attest to the previous block and propose yet another block on top of the previous oneUNKNOWN_PARENT
result if the sync process on EL side hasn't resolved the dependency yet.The text was updated successfully, but these errors were encountered: