-
Notifications
You must be signed in to change notification settings - Fork 781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add processing and processed caching to the DA checker #4732
add processing and processed caching to the DA checker #4732
Conversation
…into move-deneb-lookup-delay-logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review isn't done - but I had some suggestions
@@ -149,31 +145,12 @@ impl<T: BeaconChainTypes> DataAvailabilityChecker<T> { | |||
pub fn get_missing_blob_ids( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should be merged with get_missing_blob_ids_checking_cache
since that's the only function that calls this. It would be more clear to have the behavior be:
None -> not sure which blobs we require
Some([]) -> No blobs are required
Some([a,b,c]) -> we know we need blobs [a,b,c]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although perhaps you still want a way to say "we don't know, but we don't have blobs [a,c]
" if we have blobs [b,d]
In that case, you might try returning something like this:
enum MissingBlobResponse {
KnownMissing(Vec<BlobIdentifier>), // We know for certain these blobs are missing.
PossibleMissing(Vec<BlobIdentifier>), // We think these blobs might be missing.
BlobsNotRequired, // Blobs are not required.
}
impl Into<Vec<BlobIdentifier>> for MissingBlobResponse {
fn into(self) -> Vec<BlobIdentifier> {
match self {
MissingBlobResponse::KnownMissing(v) => v,
MissingBlobResponse::PossibleMissing(v) => v,
MissingBlobResponse::BlobsNotRequired => vec![],
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this enum works well, i ended up using it in some of the single blob lookup code too. right now, we have simple logic
- if we're prior to deneb, blobs aren't required
- if we see a block, we know everything
- if we don't see a block, we know nothing
but we could update this to
- if we see a blob at index 4, we know all blobs with index < 4 must exist and blobs with index > 4 may exist
this enum doesn't give us both of these pieces at once, but if we end up wanting that info i think this could be pretty easily extended for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we see a blob at index 4, we know all blobs with index < 4 must exist and blobs with index > 4 may exist
isn't this not necessarily true given that the proposer could give us a blob that's not in the block to fake us out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's a good point so we'd have to make sure that info were updated whenever we do get the block. Might not be worth implementing, I'm not sure it gives us much benefit over just querying for everything
beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
Show resolved
Hide resolved
beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs
Show resolved
Hide resolved
beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs
Show resolved
Hide resolved
beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs
Outdated
Show resolved
Hide resolved
beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs
Show resolved
Hide resolved
beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
Outdated
Show resolved
Hide resolved
beacon_node/beacon_chain/src/data_availability_checker/processing_cache.rs
Outdated
Show resolved
Hide resolved
…ing_cache.rs Co-authored-by: ethDreamer <[email protected]>
beacon_node/beacon_chain/src/data_availability_checker/child_component_cache.rs
Outdated
Show resolved
Hide resolved
…albigsean/lighthouse into move-deneb-lookup-delay-logic
…into move-deneb-lookup-delay-logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks solid from my read-through. I wasn't familiar with a lot of this code before, especially the Deneb networking stuff, so I feel like I haven't given this an incredibly thorough examination. Still, the core idea seems solid, and I appreciated the docs and tests for the new components like AvailabilityView
.
This also reinforces my desire to increase the testing of our Deneb code through simulation & network testing. I feel like these caches and channels could easily hide subtle race conditions. When we have some time, lets hit them with Hydra.
beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs
Outdated
Show resolved
Hide resolved
beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs
Outdated
Show resolved
Hide resolved
beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs
Outdated
Show resolved
Hide resolved
beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs
Outdated
Show resolved
Hide resolved
…ility_view.rs Co-authored-by: Michael Sproul <[email protected]>
…ility_view.rs Co-authored-by: Michael Sproul <[email protected]>
…ility_view.rs Co-authored-by: Michael Sproul <[email protected]>
…ility_view.rs Co-authored-by: Michael Sproul <[email protected]>
Co-authored-by: Michael Sproul <[email protected]>
…into move-deneb-lookup-delay-logic
…albigsean/lighthouse into move-deneb-lookup-delay-logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking quite good. The abstractions are really neat! Thanks for the detailed description of the AvailabilityView
and the rationale, really helped in understanding the code.
Haven't completely finished the review, but initial comments.
beacon_node/beacon_chain/src/data_availability_checker/processing_cache.rs
Outdated
Show resolved
Hide resolved
beacon_node/network/src/network_beacon_processor/sync_methods.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on this. LGTM.
Issue Addressed
Resolves: #4667
Proposed Changes
===== See Below for Description Additions on 9/21 =====
This PR aims to fix raciness we're observing in single block/blob lookups by:
1. moving state related to delayed lookups out of the networking layer (no longer caching messages in the delay service) and only using the beacon chain's data availability checker
I experimented with moving the entire delay service to the beacon chain. This didn't end up working well. The timing logic needs access to the beacon processor, and the beacon processor can’t be added to the beacon chain without creating a circular dependency. More generally "pushing" to the networking layer from the beacon chain isn’t done anywhere else from what I can tell… so it’s at odds with how we've architected things generally. So in this PR I'm separating the timing logic from the state related to delayed lookups. The network now periodically polls the beacon chain for any block roots it needs to make a lookup request for.
2. expanding what is stored in the data availability checker to include a notion of "processing" and "processed" blocks and blobs.
I was initially thinking we could use the gossip seen caches for this, but this doesn't end up working too well for a couple reasons:
(index, block_root)
. Because there’s noblob_root
(no notion of blob uniqueness), the cache would cause us to filter valid blobs if we receive an invalid blob first for the same(index, block_root)
. There's also no blob slashing condition. This is an open problem in gossip caches (see Blob spam protection mechanism ethereum/consensus-specs#3261). Because this is an open problem in gossip we need to be resilient to this issue in single blob lookups, so we have to be able to request multiple(index, block_root)
combinations via RPC. If we implement this mechanic from the linked issue, however, it could work:I talked about this with @ethDreamer though, and it seems like not propagating multiple blobs per
(index, block_root)
in gossip, while being able to handle it in RPC is probably the way to go to minimize the chance of a malicious proposer's block being made canonical, while still being able to recover if it is made canonical.=============== Description Additions on 9/21 ==========
Why
AvailabilityView
?We end up needing similar logic in a few places where we store parts of a block that we might not be sure are valid while waiting for other parts of the block. The
AvailabilityView
is to make sure we don't have to re-implement it and will make it easier to update across multiple caches. The logic as currently written is as follows:The intent here is to 1. give precedence to earlier messages, and 2. make sure we can recover from messages we eventually discover weren't valid.
The places we use this type of logic:
UnknownParent
we don't want to keep processing the block/blob because it might be garbage. We need a place to put the block/blob, currently in stable lighthouse we just send these blocks to the network layet. In the deneb branch, we continue this tradition, but it gets more complicated because we are only sending one piece at a time. Hence the introduction ofChildComponents
Why get rid of block/blob "consistency checks"?
I've been thinking about this a decent bit, and I think we were being too strict with our RPC validation. The consensus spec intentionally makes data availability checking very generic:
https://github.com/ethereum/consensus-specs/blob/dev/specs/deneb/fork-choice.md#is_data_available
Pretty much so long as you can kzg verify blobs that match the block, the data is available. Other fields in the
BlobSidecar
likeparent_root
,slot
, andproposer_index
are necessary in gossip, but in RPC, we really on care if the data is correct. If for example a block is accepted as canonical but we miss it on gossip. Later requesting via RPC and failing because an inconsequential field was wrong even if we have the correct data seems unnecessarily strict.Bonus
Linking this for visibility but I think this piece of delay logic was bugged: #4732 (comment)