Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve failover robustness #8405

Merged
merged 7 commits into from
Jun 27, 2024
Merged

Conversation

zilm13
Copy link
Contributor

@zilm13 zilm13 commented Jun 25, 2024

PR Description

Covered following cases:

  • switchToFailoverEventStreamIfAvailable() is not always fired when currently used api fails, it could be fired by connection error fallback for example and switch endpoint (not considering primary, when it could be the best) when not needed
  • there could be a case when BeaconNodeReadinessManager is switched to primary as ready, sent event once and will not repeat it, but by some reason this event slipped in EventSourceBeaconChainEventAdapter (say callback switchToFailoverEventStreamIfAvailable() fired just after it). Repeated events guarantee that we will switch to primary in this case sooner or later and price is nothing.

Wanted to add some tests, but it doesn't look it could be tested easily. Maybe you could provide some suggestion?

Fixed Issue(s)

Fixes #8180 (not 100% on this)

Documentation

  • I thought about documentation and added the doc-change-required label to this PR if updates are required.

Changelog

  • I thought about adding a changelog entry, and added one if I deemed necessary.

@mehdi-aouadi
Copy link
Contributor

mehdi-aouadi commented Jun 27, 2024

Regarding the tests, It would be great to add a test that covers the EventSourceBeaconChainEventAdapter not switching to another BN when the current one is READY. We normally can mock the beaconNodeReadinessManager.getReadinessStatus(currentBeaconNodeUsedForEventStreaming) call to make it return READY and then check that the findReadyFailoverAndSwitch hasn't been called (check that the currentBeaconNodeUsedForEventStreaming is still the same for example)

@@ -97,6 +100,45 @@ public void performsPrimaryReadinessCheckWhenFailoverNotReadyAndNoOtherFailovers
verify(beaconNodeReadinessManager).performPrimaryReadinessCheck();
}

@Test
public void dontJumpBetweenFailoversWhenFailoverIsReady() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename it to doNotSwitchToFailoverWhenCurrentBeaconNodeIsReady

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice name, updated

Copy link
Contributor

@mehdi-aouadi mehdi-aouadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Just a small nit

@zilm13 zilm13 merged commit 51a0488 into Consensys:master Jun 27, 2024
16 checks passed
@zilm13 zilm13 deleted the bn-fallback-refactor branch June 27, 2024 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Degraded attestation performance when Teku VC has a secondary BN defined
2 participants