Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Light forward sync mechanism #6515

Merged
merged 82 commits into from
Oct 30, 2024
Merged

Light forward sync mechanism #6515

merged 82 commits into from
Oct 30, 2024

Conversation

cheatfate
Copy link
Contributor

@cheatfate cheatfate commented Aug 28, 2024

Light forward sync uses the light client protocol to accelerate forward syncing done after checkpoint sync, when restarting the client or indeed when syncing up after a longer period of being offline.

It works by establishing a "safe" head to sync to using the light client protocol, tracing this history back to the latest known state in the database and then using a less computationally intensive method of verifying blocks based on the knowledge that the blocks lead to a valid head.

It can currently be enabled by specifying --debug-long-range-sync=light, though it should be understood that --debug- options are not compatible across client versions and should not be used in production setups.

Copy link

github-actions bot commented Aug 28, 2024

Unit Test Results

       12 files  ±  0    1 814 suites  +4   53m 34s ⏱️ -50s
  5 232 tests +  5    4 884 ✔️ +  5  348 💤 ±0  0 ±0 
29 077 runs  +20  28 625 ✔️ +20  452 💤 ±0  0 ±0 

Results for commit fe6452e. ± Comparison against base commit 250a80e.

♻️ This comment has been updated with latest results.

@cheatfate cheatfate force-pushed the hybrid-sync branch 2 times, most recently from 5be1b71 to 575d711 Compare September 5, 2024 07:34
@cheatfate cheatfate marked this pull request as ready for review September 9, 2024 15:24
buffer = Chunk.init(kind, uint64(slot), uint32(plainSize), data)
wrote = writeFile(chandle.handle, buffer).valueOr:
discard truncate(chandle.handle, origOffset)
discard fsync(chandle.handle)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are some fsync calls fine to discard/ignore the return value of but others not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is because of error in writeFile (initial error we going to report to user), so this is attempt to cleanup. In case where number of bytes written was less than we expect, it means that file become inconsistent, so we trying to truncate file to known good size and we trying to fsync it, but we still want to report original error message to the user.

Copy link
Contributor

@etan-status etan-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great improvement for the security of default sync compared to the existing genesis sync.

As this is timely to properly test, would be great to have at least a happy case test with a few epochs worth of minimal blocks to avoid accidental regressions.

If data size is a concern, one may consider syncing 1 (or N) sync committee period (~27 hrs / 8192 slots) at a time by treating each intermediate light client sync step as its own separate sync target, applying each of them separately before syncing to the next period. The hook to obtain intermediate results is LightClientUpdateObserver. It would add quite a bit of complexity, though, so maybe waiting for demand is better for now.

beacon_chain/beacon_chain_file.nim Outdated Show resolved Hide resolved
beacon_chain/beacon_node_light_client.nim Outdated Show resolved Hide resolved
beacon_chain/gossip_processing/block_processor.nim Outdated Show resolved Hide resolved
beacon_chain/spec/signatures_batch.nim Outdated Show resolved Hide resolved
beacon_chain/sync/sync_overseer.nim Outdated Show resolved Hide resolved
beacon_chain/sync/sync_overseer.nim Show resolved Hide resolved
beacon_chain/beacon_node_light_client.nim Outdated Show resolved Hide resolved
@@ -92,6 +89,8 @@ proc initLightClient*(
.shouldSyncOptimistically(node.currentSlot):
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new sync mechanism works for all lcDataFork > LightClientDataFork.None, but here we have an additional >= Capella restriction, because that's required for the EL blockHash and setOptimisticHead.

All networks that we actively support have advanced past Capella, so letting this remain in the >= Capella section is also fine.

beacon_chain/beacon_chain_file.nim Show resolved Hide resolved
@cheatfate cheatfate force-pushed the hybrid-sync branch 4 times, most recently from d9ea11c to 7584886 Compare September 25, 2024 00:14
@tersec tersec merged commit 18409a6 into unstable Oct 30, 2024
13 checks passed
@tersec tersec deleted the hybrid-sync branch October 30, 2024 05:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants