Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework Validator Client fallback mechanism #4393

Merged
merged 69 commits into from
Oct 3, 2024

Conversation

macladson
Copy link
Member

@macladson macladson commented Jun 13, 2023

Issue Addressed

#3613

Proposed Changes

  • Reworks large sections of the validator fallback mechanism.
  • Uses health tiers to sort beacon nodes by healthiest first.
  • Considers beacon node sync distance, optimistic status and execution status.

Each connected BN is queried every slot by the VC and is then sorted in a priority list. When attempting to perform duties, the VC will use the healthiest BN first (within a configurable tolerance).

Additional Info

  • There is currently no lockout period. This means if a BN fails a request, it will still be tried next epoch if it is seemingly healthy.
  • Removes the RequireSynced enum. This was scarcely used but may have implications when removed.
  • Removes the OfflineOnFailure enum. This was also scarcely used. In the new system, beacon nodes are never marked as offline so in essence have OfflineOnFailure::No by default.

Remaining Work

  • CLI configurable distance tier modifiers.
  • Actually use the status field on CandidateBeaconNode or remove it entirely.
  • Fix logging since a lot of it was removed and not replaced.
  • Make a /lighthouse/ui endpoint which exposes the fallback health stats for siren.
  • Create working simulator to test fallback mechanism.

@macladson macladson added work-in-progress PR is a work-in-progress val-client Relates to the validator client binary labels Jun 13, 2023
@macladson macladson force-pushed the vc-fallback branch 2 times, most recently from 2226ed9 to 7e96756 Compare June 16, 2023 04:30
@macladson macladson marked this pull request as ready for review August 1, 2023 04:37
@macladson macladson added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Aug 3, 2023
@paulhauner paulhauner self-assigned this Aug 4, 2023
@jimmygchen jimmygchen added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Sep 30, 2024
@macladson macladson added ready-for-review The code is ready for review and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Oct 2, 2024
Copy link
Member

@AgeManning AgeManning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming to approve Jimmy's last commit on this.

I've not reviewed this PR

Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice one, finally 😭 🎉
Will raise a separate issue to revisit the locks. I think #6413 (lockbud) will also help.

@jimmygchen
Copy link
Member

@mergify queue

Copy link

mergify bot commented Oct 3, 2024

queue

🛑 Branch protection settings are not validated anymore

Branch protection is enabled and is preventing Mergify to merge the pull request. Mergify will merge when branch protection settings validate the pull request once again. (detail: 1 review requesting changes and 2 approving reviews by reviewers with write access.)

@jimmygchen
Copy link
Member

@mergify requeue

Copy link

mergify bot commented Oct 3, 2024

requeue

✅ This pull request will be re-embarked automatically

The followup queue command will be automatically executed to re-embark the pull request

Copy link

mergify bot commented Oct 3, 2024

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at f870b66

@mergify mergify bot merged commit f870b66 into sigp:unstable Oct 3, 2024
29 checks passed
chong-he pushed a commit to chong-he/lighthouse that referenced this pull request Nov 26, 2024
* Rework Validator Client fallback mechanism

* Add CI workflow for fallback simulator

* Tie-break with sync distance for non-synced nodes

* Fix simulator

* Cleanup unused code

* More improvements

* Add IsOptimistic enum for readability

* Use configurable sync distance tiers

* Fix tests

* Combine status and health and improve logging

* Fix nodes not being marked as available

* Fix simulator

* Fix tests again

* Increase fallback simulator tolerance

* Add http api endpoint

* Fix todos and tests

* Update simulator

* Merge branch 'unstable' into vc-fallback

* Add suggestions

* Add id to ui endpoint

* Remove unnecessary clones

* Formatting

* Merge branch 'unstable' into vc-fallback

* Merge branch 'unstable' into vc-fallback

* Fix flag tests

* Merge branch 'unstable' into vc-fallback

* Merge branch 'unstable' into vc-fallback

* Fix conflicts

* Merge branch 'unstable' into vc-fallback

* Remove unnecessary pubs

* Simplify `compute_distance_tier` and reduce notifier awaits

* Use the more descriptive `user_index` instead of `id`

* Combine sync distance tolerance flags into one

* Merge branch 'unstable' into vc-fallback

* Merge branch 'unstable' into vc-fallback

* wip

* Use new simulator from unstable

* Fix cli text

* Remove leftover files

* Remove old commented code

* Merge branch 'unstable' into vc-fallback

* Update cli text

* Silence candidate errors when pre-genesis

* Merge branch 'unstable' into vc-fallback

* Merge branch 'unstable' into vc-fallback

* Retry on failure

* Merge branch 'unstable' into vc-fallback

* Merge branch 'unstable' into vc-fallback

* Remove disable_run_on_all

* Remove unused error variant

* Fix out of date comment

* Merge branch 'unstable' into vc-fallback

* Remove unnecessary as_u64

* Remove more out of date comments

* Use tokio RwLock and remove parking_lot

* Merge branch 'unstable' into vc-fallback

* Formatting

* Ensure nodes are still added to total when not available

* Allow VC to detect when BN comes online

* Fix ui endpoint

* Don't have block_service as an Option

* Merge branch 'unstable' into vc-fallback

* Clean up lifetimes and futures

* Revert "Don't have block_service as an Option"

This reverts commit b5445a0.

* Merge branch 'unstable' into vc-fallback

* Merge branch 'unstable' into vc-fallback

* Improve rwlock sanitation using clones

* Merge branch 'unstable' into vc-fallback

* Drop read lock immediately by cloning the vec.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-review The code is ready for review v6.0.0 New major release for hierarchical state diffs val-client Relates to the validator client binary
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants