-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decouple blobs #29
Decouple blobs #29
Conversation
I'm confused how a double execution payload verification is avoided? I think the use of pushed-based processing on every blob is what creates a risk of duplicate work whereas polling a future that completes on blob verification success/failure makes it pretty easily avoided. It's probably not clear what I was thinking on this branch cause it's a bit incomplete so far so here's some more detail on the direction: Generally the idea is that any block we get that passes gossip verification should be an
The first part of this future can be a This is an example of what the cache could look like:
Note that a cache here is probably what we want not a buffer, because we also need to draw from this when both serving RPC requests for blocks/blobs and in order to know which blobs we are lacking to trigger the right RPC requests for blocks/blobs. We'll want to trigger a blob by root parent lookup and queueing for reprocessing somewhere but I don't know where the best place for that is at this point. |
Yeah sure we can do that. I was thinking smthg like this, when the block-blobs group has already been down the rabbit hole once, we
I see 2 problems with your solution:
|
I don't think this necessarily needs to happen before payload verification, I think we can look at them as independent. For this, I think we'll need to add a
I don't think the kzg commit verification needs to happen after payload verification I think the two can happen in parallel. We'll have to pay close attention to how these invalid blob scenarios are described in gossip because only blobs passing all gossip conditions should be put in this cache: The described scenarios would require the proposer to sign over invalid data so it's very likely this block gets orphaned.. but this is the type of thing we'll need to think very carefully about with regard to fork choice. Within a slot attesters should not be voting on unavailable block/blobs but we should have the ability to re-org to blocks that were not immediately available but became available to us and the rest of the network had available. (like if we later complete a by roots request for blobs) Generally the halting criteria is the criteria for the completion of the future in the availability pending block. And this will have to be constrained by a timeout because at some point in block processing we will need to consider the data "unavailable". But this doesn't necessarily mean we should get rid of the block + blob for the above reasons. A failure of some types may mean we have to queue for reprocessing (waiting on the last blob). |
3fc5b36
to
472d153
Compare
So, @realbigsean I reset that first commit and used the future in a block cache item to know when all blobs have arrived, as you mentioned. I didn't see it necessary for this |
If we want to parallelise the verification work, we can add an Arc to a block to a blob cache item at the start of
|
518b100
to
40993e2
Compare
40993e2
to
88109cc
Compare
91ed9e9
to
0873865
Compare
70bf525
to
ec6bec2
Compare
## Proposed Changes Allowing compiling without MDBX by running: ```bash CARGO_INSTALL_EXTRA_FLAGS="--no-default-features" make ``` The reasons to do this are several: - Save compilation time if the slasher won't be used - Work around compilation errors in slasher backend dependencies (our pinned version of MDBX is currently not compiling on FreeBSD with certain compiler versions). ## Additional Info When I opened this PR we were using resolver v1 which [doesn't disable default features in dependencies](https://doc.rust-lang.org/cargo/reference/features.html#resolver-version-2-command-line-flags), and `mdbx` is default for the `slasher` crate. Even after the resolver got changed to v2 in sigp#3697 compiling with `--no-default-features` _still_ wasn't turning off the slasher crate's default features, so I added `default-features = false` in all the places we depend on it. Co-authored-by: Michael Sproul <[email protected]>
My branch is starting to get to a good state now. As you know I scratched the shared-memory approach above of a separate blob and block cache, because the most sensible time point to send blobs to its block turned out to be as they came over the network, so it makes sense to have blobs and block in the same place. That however concerned me since the start, having many worker threads waiting on a write lock for the same cache, block workers and blob workers. So now I have built a pure* message-passing model around your type DataAvailabilityHandle<E> = JoinHandle<Result<Option<Arc<BlobsSidecar<E>>>, BlobError>>; or more exactly with the new sidecar type type DataAvailabilityHandle<E> = JoinHandle<Result<Option<Arc<SignedBlobSidecar<E>>>, BlobError>>; sending blobs to the block directly instead (or rather to a receiver that the block will pick up when it arrives). Basically, it works around three types and an enum implementing one trait TryIntoAvailableBlock: At many points in the code however, we know the exact type of the three. For example all blocks that come over the network and are sent over the network are always of the type There is one constraint with this message-passing model for decoupling that we have to work around (spelling availability correct many times in a row). An pub struct AvailabilityPendingBlock<E: EthSpec> {
block: Arc<SignedBeaconBlock<E>>,
data_availability_handle: DataAvailabilityHandle<E>,
} Apart from that, a block in processing has one path that it travels that branches in Yes/No branches, there is no need to clone it. Say for example a block's parent is unknown, a check which LH calls at many different places in processing, some before and some after we A very important aspect of decoupling using the message-passing model becomes error handling. There are two new errors that are important to handle and not propagate The values I set for constants There is a background thread I added for polling an |
Issue Addressed
sigp#3991
Proposed Changes
A block with kzg commits is not available until its blobs have arrived.
Save kzg commits against blobs verification until after execution payload verification, along with
AvailabilityPendingBlock
.Additional Info
A block's execution payload has to be verified at some point before importing to fork choice. If a block sets of to get this done after it gets gossip verified and propagated, then in some cases we may be lucky that, while waiting on input from execution layer, all of our blobs came. If this is not the case, we cache the
ExecutedBlock
which holds aDataAvailabilityHandle
, and put the sender to theDataAvailabilityHandle
in the blobs cache.Building on from this solution
lighthouse/beacon_node/beacon_chain/src/blob_verification.rs
Line 275 in dd31621
I'm keeping in mind that the probability that the blocks always comes last, after all its blobs, is small, and I'm allowing therefore an efficient solution for most scenarios at the cost of a less efficient solution for the scenario that the block comes last, here:
https://github.com/emhane/lighthouse/blob/97a1847e599fccd9c8ac6cc7ba5c301e4f554b1f/beacon_node/network/src/beacon_processor/worker/gossip_methods.rs#L665-L692