-
-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add circuit breaker for builder #4488
Conversation
7771b13
to
a82e277
Compare
Performance Report✔️ no performance regression detected Full benchmark results
|
// an api call to the builder to check its status, so we will not await for | ||
// its completion here as we want to trigger local execution here without | ||
// delay | ||
this.chain.updateBuilderStatus(clockSlot).catch((e) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without a more in-depth review this looks problematic. Since this is async there's a racing condition on producing the block and updating the status. Could be that the builder is disabled too late and the block is already produced?
Why not check at the end of every epoch (or some point, say at 2nd slot) the status of the previous epoch and update the status of the builder with it?
- That allows to have constant metrics of status
- You can subscribe to block event instead of
getSlotsPresent()
and have real-time cumulative status of blocks per epoch, which can also go into a metric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for counting the slots via event and the corresponding metric
However with regard to calling updateBuilderStatus
here: for disabling
, this is synchronous because the builder's check status api is only being called for enabling
, which if it doesn't respond within a second should anyway be ok to have builder disabled for the next slot's block.
I think its a good idea to check for builder status here before proposing next slot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However with regard to calling updateBuilderStatus here: for disabling, this is synchronous because the builder's check status api is only being called for enabling
Got it, then can you find a way to make it more obvious on the code that disabling is sync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
1f8d9de
to
4afca90
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
Discussed offline to change the ALLOWED_FAULTS value to not follow the spec, as allowing only 1 fault is too sensitive.
Motivation
In case there is a malevolent or buggy builder, it can take down the entire network by not propagating the blocks
This PR add a circuit breaker for builder in bad network conditions
Description
Closes #4483
Manual Test chronology on
ropsten-1