-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimistic sync #3793
optimistic sync #3793
Conversation
cda098b
to
d0972b6
Compare
57d0ad2
to
68a31e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still a few open TODO
and when true
that need to be fixed or logged as issues to avoid losing them.
Overall quite impressive how compact this change is implementation-wise.
warn "markPayloadInvalid: attempt to invalidate valid block", | ||
blck = shortLog(blck.get) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During initialization, only finalizedHead
and head
are populated into dag.optimisticRoots
. Could it happen that this function is called on one of the intermediate blocks? Is this a condition that is expected in edge cases, or is this a programming error? If the latter, a doAssert verifyFinalization notin dag.updateFlags
could be useful to detect this being hit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/ethereum/consensus-specs/blob/dev/sync/optimistic.md#transitioning-from-valid---invalidated-or-invalidated---valid describes, as far as this warning's topic:
These operations are purposefully omitted. It is outside of the scope of the specification since it's only possible with a faulty EE.
Such a scenario requires manual intervention.
It's possible that it'd be better to be even more emphatic than a warning, because this is a failure on par with doppelganger detection. Something went very wrong. That said, while it's not necessarily a Nimbus programming error, it is a programming error creating potential consensus failure, and verifyFinalization
mode making this more obvious would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3c864f0 adds doAssert verifyFinalization notin dag.updateFlags
8100623
to
e56c9d1
Compare
There's only one This PR now removes significantly more
These rest are pre-existing and/or out of scope/deliberately not part of this PR. |
Tested successfully with syncs from scratch on Ropsten with each of Erigon, Besu, Geth, and Nethermind. |
…block root filled in
@@ -80,24 +80,124 @@ proc expectBlock*(self: var ConsensusManager, expectedSlot: Slot): Future[bool] | |||
|
|||
return fut | |||
|
|||
from eth/async_utils import awaitWithTimeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recently had a discussion with a D programmer who promoted this style of function-local imports. I guess your rationale is the same - to make it easy to copy/paste functions to a different module by keeping their list of imports attached to the function itself.
One potential risk of this approach is that it also makes it much easier to accidentally introduce an import cycle while moving the function which will result in the typical hard-to-decipher error messages brought by cyclic imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it helps for that purpose.
Other advantages:
- Nim 1.2 unused import detection is essentially broken, and Nim 1.6 is only somewhat better. This results in import lists simply accumulating because there's a 1,000-line module with a bunch of imports at the top, with no hint of what they're importing or why. Enough modules have some side-effect during import to disable the unused import warning, too. This makes it clearer that for modules imported not for their side effects (such as tests) if the imported symbols don't match, then the module import isn't necessary. It's not 100% automated, but it's better than just letting imports accumulate, as well as in theory providing Nim with more information for detecting unused imports. The more local and specific the imports, both in the symbols they name and where they're used, the better for this purpose.
It's related to one of the reasons I tend to unexport symbols when I notice them: it's not because of some OOP-thing about data hiding, it's about allowing Nim to detect dead code and, when reading or refactoring code, being confident that (unless there's an export
statement with said symbol) changes in that code will only directly affect that one module, and that one doesn't have to scan the entire project. The deeper in the libraries one is working, the more this applies.
- Various surprising overloads can bubble up from imported modules and cause surprising errors of their own (e.g., where
withState(someState):
template f(): state.foo
f()
worked right until some irrelevant-to-purpose symbol from deep inside Chronos's HTTP server was imported transitively, and meant that produced a confusing error message, requiring template f(): state().foo
. Easy fix, a while to find the exact culprit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this interferes with Nim's cyclic import detection, IMO that's a Nim issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cyclic imports are legal in Nim, they just produce hard to diagnose error messages at times.
While I have no strong objections against this style, my comment meant to serve as a warning that one has to be careful with the notion that individual functions are now easy to move between modules if their imports are structured in this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cyclic imports are legal in Nim, they just produce hard to diagnose error messages at times.
they also produce invalid code with overly broad arguments, where the exact function that actually matches ends up depending on frivolous things like import order - there was no end to ssz import issues until all cyclical imports were removed - this is one more nim feature that sounds good on paper but is worthless practically because it's broken and untested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not advocating that they should be used. I was explaining that there is no "cyclic import detection" as the message above implied (it would be relatively easy to add to Nim though).
Otherwise, I have advocated many times that cyclic imports should be supported better in the language because their current state leads to the proliferation of the "types module" which makes encapsulation difficult to achieve in Nim:
signedBlock: ForkySignedBeaconBlock, payloadValid: bool, | ||
queueTick: Moment = Moment.now(), | ||
validationDur = Duration()): | ||
Future[Result[BlockRef, BlockError]] {.async.} = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this design in which storeblock becomes async is a bit unfortunate - ie the processes that call it are not necessarily well prepared to lose their current in-memory state for long periods of time, specially not while waiting for a (potentially slow) EL response (since they're part of validator duties)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could do something like traceAsyncErrors
, or set up yet another queue, but there's fundamentally an asynchronous aspect to this, and it can't avoid but to change the storeBlock
contract in some way -- either it can return, without any particular guarantee of "full" success (even detecting blatantly invalid blocks) or they need to wait. One minimal change is always having
await self.consensusManager.updateHeadWithExecution(wallTime.slotOrZero)
there be either a queue or (implicitly a queue) traceAsyncErrors
that would detach the two, and allow storeBlock
itself to be synchronous again.
Another point I played with was within the updateHead
functions -- the serial dependence here is forkchoiceUpdated
(intrinsically a long-lasting call) having to be complete before one can, in general, tell, the DAG about blocks being optimistic or verified. It's possible to just have it fire off the fcU, then immediately call dag.updateHead
, but that's generally not great. One could arrange a callback or similar, but that's the kind of pattern async/await more-or-less linerarizes control-flow-wise. So probably within updateHeadWithExecution
, the async
is warranted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general theme here is how much of the serial dependency is real vs how much is an artifact of a specific way of writing these functions (with async, with callbacks, with queues, with anything), but, in principle, to do the tasks they need to do to participate in the chain, how intrinsic vs how incidental are these delays.
One approach is to care even less about validity than it does now, and just ... not wait. Treat it as fire-and-semi-forget. Just import it if it's a candidate for optimistic sync, move on, etc. There might be some weird display quirks (e.g., the status bar showing wall slot != wall head briefly each slot even when everything goes well, or repeatedly showing opt sync
briefly if it tries to be truly accurate for fractions of a second), but probably more important that the intrinsic dependencies exist: the aggregated attestations depend on the unaggregated attestations which depend on the correct head being known.
The challenge here is that https://github.com/ethereum/consensus-specs/blob/dev/sync/optimistic.md#validator-assignments notes:
Validator assignments
An optimistic node is not a full node. It is unable to produce blocks, since an execution engine cannot produce a payload upon an unknown parent. It cannot faithfully attest to the head block of the chain, since it has not fully
verified that block.Block Production
An optimistic validator MUST NOT produce a block (i.e., sign across the
DOMAIN_BEACON_PROPOSER
domain).Attesting
An optimistic validator MUST NOT participate in attestation (i.e., sign across the
DOMAIN_BEACON_ATTESTER
,DOMAIN_SELECTION_PROOF
orDOMAIN_AGGREGATE_AND_PROOF
domains).Participating in Sync Committees
An optimistic validator MUST NOT participate in sync committees (i.e., sign across the
DOMAIN_SYNC_COMMITTEE
,DOMAIN_SYNC_COMMITTEE_SELECTION_PROOF
or
DOMAIN_CONTRIBUTION_AND_PROOF
domains).
To comply with the spec, this serialization enforced by await
is required: if one wants to attest, and attest as soon as possible modulo the make-sure-everyone-gets-the-block-first delay, the block must be actually verified, not only optimistic.
So for a non-validating node, the literal stakes are lower, and for a validating node, there's ultimately no choice but to wait. At least this way, if the node has other things to do, it can do them in the meantime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#3842 might address this. It needs testing, but should be a viable approach. It keeps enough of these invariants that it should remain valid, but be less invasive.
var | ||
optForkchoiceHeadSlot = GENESIS_SLOT # safe default | ||
optForkchoiceHeadRoot: Eth2Digest | ||
from ../consensus_object_pools/blockchain_dag import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
funny because in refactoring, I've been going in the opposite direction of exporting stuff much more consistently to avoid long import lists - this helps keep symbols that should be visible, visible - the template problem with the wrong symbol being matched is annoying but can often be solved by turning the template into a proc
or generally giving things more descriptive names.. it's a mess, but with exports, it's actually easier to stay away from redundant imports (because there's simply fewer of them that can go stale)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also creates these environments have vastly more symbols than they need imported, with no automatic way to distinguish whether they're doing so by Nim, because those imports are indeed necessary, its just, only a small part of them was. When isolating reproducible test cases for Nim issue filing I run across these repeatedly -- some module needs three symbols from some relatively small modules, but for some slight convenience, it does
import large_umbrella_module
which in turn is something like
import a, b, c, d, e, f
export a, b, c, d, e, f
and then some of a
, b
, etc continue this fractally. It works, yes, but it also inhibits analysis of what imports are necessary. If only one module imported by large_umbrella_module
turns out to be necessary after some refactoring, that will not be automatically detected in a visible way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another aspect is that compile times are already problematic, and these single monolithic imports hinder attempts at improving that. Some of this is that there are individual modules which take a long time to compile, and no amount dependency management fixes that, but in other cases, the times of lots of small-to-medium modules being imported literally adds up. Even if/when Nim within a single "project" (nim c foo
) doesn't keep recompiling modules, which, hopefully, it gets repeated across the nimbus_beacon_chain
, nimbus_validator_client
, et cetera.
One first step towards addressing this for any given module is clarifying what, exactly, it needs, and in various cases (e.g., the sqlite C code being compiled) that's meant splitting out testdbutils
from testutils
, explicitly avoiding this pile of import
/export
modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Knowing the compiler internals, It's a bit hard to guess the effects on compilation times of this technique. You are right that each module in Nim is visited only once during compilation, so effectively a large project is likely to end up compiling the same set of files regardless of the import strategy.
The nim check
command can benefit though because it may need to see a smaller subset of the project when you are close to the leaf modules in the import tree. There might also be some slight speedup brought by reducing the scope lookup tables and the numbers of candidates considered during overload resolution in some situations.
For me, the main downside of this technique are that:
- It requires quite a lot of work (unless your editor does it for you automatically)
- It makes it even harder to spot inappropriate import cycles.
Both can be addressed with better tooling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vastly more symbols than they need imported
this is true, but it is also often the case that a missing symbol is a lot more dangerous than an extra one - the former means a more generic overload will be chosen leading to subtle miserialization and similar issues due to nim-ser's open matching policy, the latter a (presumably fixable) compile-time error.
single monolithic imports hinder attempts at improving that.
we have two categories of monoliths - giants like chronos that import and export everything under their project, and "small" modules that have hard dependencies of their own - the former should of course be split up, but with the latter, the following happens very often: module A imports B, and C imports A - to use A from C, B must also be imported, so C now imports A and B - then A changes and no longer needs B it remains in C as an explicit import and nim's (broken) import analysis indeed rarely identifies it as unused.
For the latter case, I've found that it's easier to export hard dependencies - modules that use A then have a shorter import list of their own, which helps keep the list tight overall - there is no compile-time downside in this example because A has a hard dep on B, so B must be compiled no matter what..
We have two more issues:
- we reuse types from implementation libraries - for example , we reuse eth1 block hashes in eth2, which brings in a lot of junk from eth1 instead of using just a minimal eth2-specific definition (ditto libp2p) - presumably this is done because copying is bad, but it also creates rigit ties between the project that are not always correct (just because an eth1 block has is serialized to hex in one way in eth1 doesn't mean eth2 does it the same way - these kinds of false carry-overs are quite problematic)
- the p2p macro defines both client and server at the same time - this requires clients (like the VC) to import practically all of the server as well (VC doesn't need libp2p, but still imports it) - similar problems happen with other "categories" of imports, and the result is that "auxiliary" compiles like tests and simulations take longer even though they don't use a lot of the stuff that transitively ends up being processed - bearssl was a good example as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the former means a more generic overload will be chosen leading to subtle miserialization and similar issues due to nim-ser's open matching policy
That seems like an issue with nim-ser's open matching policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like an issue with nim-ser's open matching policy.
yes, that's the biggest and most obvious culprit, and @zah has promised progress on this front, but it happens with other things generic as well (hash
function which messes up Table
, $
, etc)
let newBlockRef = node.blockProcessor[].storeBlock( | ||
MsgSource.api, wallTime, blck) | ||
let newBlockRef = await node.blockProcessor.storeBlock( | ||
MsgSource.api, wallTime, blck, payloadValid = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how come payload is considered valid when block is coming from REST? ie rest is more or less gossip, from a trust-point-of-view
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/ethereum/consensus-specs/blob/v1.2.0-rc.1/sync/optimistic.md