-
Notifications
You must be signed in to change notification settings - Fork 785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep PendingComponents in da_checker during import_block #5845
Conversation
…me_in_da_checker
/// Removes and returns the pending_components corresponding to | ||
/// the `block_root` or `None` if it does not exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Removes and returns the pending_components corresponding to | |
/// the `block_root` or `None` if it does not exist | |
/// Removes the `pending_components` corresponding to the `block_root`. |
@ethDreamer plz need help with the tests, quite lost on why they are broken. |
write_lock.put_pending_components( | ||
block_root, | ||
pending_components.clone(), | ||
&self.overflow_store, | ||
)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're cloning pending components for a second time here and I'm not sure if it's necessary.
If the pending component is newly created in this function, it won't immediately become available. So this means the pending components would always be in the overflow_store
if it becomes available here right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh never mind, we do need this as we're mutating a clone!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to modify pop_pending_components
to get_pending_components
. Instead you should keep it as is. Because the write lock is held the whole time, there is no difference between:
lock = get_write_lock()
item = cache.remove()
item.mutate()
cache.insert(item)
drop(lock)
and
lock = get_write_lock()
item = cache.get()
item.mutate()
cache.update(item)
drop(lock)
You can just re-insert the item even when the components are complete (just like you did above) and then remove them with the same pop_pending_components
method.
&mut self, | ||
block_root: Hash256, | ||
store: &OverflowStore<T>, | ||
) -> Result<Option<PendingComponents<T::EthSpec>>, AvailabilityCheckError> { | ||
match self.in_memory.pop_entry(&block_root) { | ||
Some((_, pending_components)) => Ok(Some(pending_components)), | ||
match self.in_memory.get(&block_root) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there might be an rare edge case scenario here where:
- we load the
PendingComponents
from memory - we update it and try to put it back, however
in_memory
is at capacity and it goes to the disk. - next time we want to update it, it loads from memory which would give us the outdated component
In this case I think it would just delay availability as we would do another lookup right? I think we might be able to avoid this altogether if we check if the components is already in memory before we attempt to use the overflow store?
Another one that's probably less of an issue:
- We load the
PendingComponents
, not found in memory, so it returned fromstore
- We update it and put it back to memory, now we have two versions one in memory and one on disk
- Now if the in memory one somehow gets pruned first, then we end up loading an old component from disk.
I think this is unlikely under normal network condition as the LRU cache's default capacity is 1024
.
I think this may be due to the randomness in the mock execution layer: lighthouse/beacon_node/execution_layer/src/test_utils/execution_block_generator.rs Lines 672 to 674 in 000a4fd
Not sure if there's any reason to use random number of blobs here - perhaps we can make this deterministic while covering all blob counts per block? |
Squashed commit of the following: commit d75321b Merge: 7c125b8 7f8b600 Author: realbigsean <[email protected]> Date: Mon May 27 19:52:41 2024 -0400 Merge branch 'unstable' of https://github.com/sigp/lighthouse into time_in_da_checker commit 7c125b8 Author: dapplion <[email protected]> Date: Sat May 25 01:32:02 2024 +0200 Keep PendingComponents in da_checker during import_block commit 7136412 Author: dapplion <[email protected]> Date: Sat May 25 01:01:23 2024 +0200 Simplify BlockProcessStatus commit dbcd7d1 Author: dapplion <[email protected]> Date: Fri May 24 20:00:05 2024 +0200 Ensure lookup sync checks caches correctly
Working on a PR to update the tests for the new logic. Will release shortly.. |
okay got the tests fixed: |
Squashed commit of the following: commit d75321b Merge: 7c125b8 7f8b600 Author: realbigsean <[email protected]> Date: Mon May 27 19:52:41 2024 -0400 Merge branch 'unstable' of https://github.com/sigp/lighthouse into time_in_da_checker commit 7c125b8 Author: dapplion <[email protected]> Date: Sat May 25 01:32:02 2024 +0200 Keep PendingComponents in da_checker during import_block commit 7136412 Author: dapplion <[email protected]> Date: Sat May 25 01:01:23 2024 +0200 Simplify BlockProcessStatus commit dbcd7d1 Author: dapplion <[email protected]> Date: Fri May 24 20:00:05 2024 +0200 Ensure lookup sync checks caches correctly
Squashed commit of the following: commit d75321b Merge: 7c125b8 7f8b600 Author: realbigsean <[email protected]> Date: Mon May 27 19:52:41 2024 -0400 Merge branch 'unstable' of https://github.com/sigp/lighthouse into time_in_da_checker commit 7c125b8 Author: dapplion <[email protected]> Date: Sat May 25 01:32:02 2024 +0200 Keep PendingComponents in da_checker during import_block commit 7136412 Author: dapplion <[email protected]> Date: Sat May 25 01:01:23 2024 +0200 Simplify BlockProcessStatus commit dbcd7d1 Author: dapplion <[email protected]> Date: Fri May 24 20:00:05 2024 +0200 Ensure lookup sync checks caches correctly
Squashed commit of the following: commit c051218 Author: ethDreamer <[email protected]> Date: Fri May 31 13:38:19 2024 +0200 Fix tests with DA checker new eviction policy (#34) commit d75321b Merge: 7c125b8 7f8b600 Author: realbigsean <[email protected]> Date: Mon May 27 19:52:41 2024 -0400 Merge branch 'unstable' of https://github.com/sigp/lighthouse into time_in_da_checker commit 7c125b8 Author: dapplion <[email protected]> Date: Sat May 25 01:32:02 2024 +0200 Keep PendingComponents in da_checker during import_block commit 7136412 Author: dapplion <[email protected]> Date: Sat May 25 01:01:23 2024 +0200 Simplify BlockProcessStatus commit dbcd7d1 Author: dapplion <[email protected]> Date: Fri May 24 20:00:05 2024 +0200 Ensure lookup sync checks caches correctly
@mergify queue |
✅ The pull request has been merged automaticallyThe pull request has been merged automatically at cb32807 |
Issue Addressed
Extends
Sync lookups relay on having an accurate picture of block and blobs processing status to perform well (not re-download, not get stuck)
Sync lookup makes the following assumption:
These assumptions are incorrect. During block import, a block is no longer in the da_checker, but it's still in the processing_cache. Because we remove the block from the da_checker when the da_checker imports the last component of the block.
The current sequence of events is
We should delay removal from the da_checker until after completing import_block
Proposed Changes
Delay removal from the da_checker until after completing import_block. New sequence:
Ideally we should remove from da_checker after removing from reqresp_pre_import_cache. However,
import_block
is called from many places (see call graph below), so it would make the logic more convoluted.