-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asynchronous signature verification on validator registration #59
Comments
My main concern with this approach is that it changes guarantees of when a registration would be available for a block builder, something the validator wants ASAP presumably. But the spec currently doesn't have a notion for that:
Following such a change, consensus clients would need to change their behavior so that they wait something like |
I'm not a fan of these proposed changes. First, I don't subscribe to the initial problem statement.
|
More significantly, any interaction between the beacon node and builder network should be governed by this repository (builder-specs) only, and be part of the Furthermore, the data API is an optional, read-only API that just provides information, which I think is how it should stay. |
Hello @metachris :) So this proposal addresses a general performance problem that comes from the uneven load to the relays, caused by all validator registrations are being requested in a very small time window. I totally understand that the mitigation of this would be problematic or even impossible. As we know, right now we have 480k validators and we can expect this number to grow overtime as the Ethereum ecosystem adoptability would increase. So getting some numbers - using a benchmark similar to this (https://go.dev/play/p/qIkX32P72xU)
This comes from the fact that there is a finite number of cores that ultimately just gets flooded by number of calculations you need to do. So purely from the performance problem - that's pretty heavy bottleneck. Answering your last concern first - The description of getValidatorRegistration endpoint in the only documentation (https://flashbots.github.io/relay-specs/#/Data/getValidatorRegistration) quite clearly states that this endpoint to "Check that a validator is registered with the relay" and that it is "Useful to check whether your own registration was successful." So noone is suggesting any changes to the API other than:
If you desire - this does not require literally any changes in your implementation - other than adding a static Writing a document above, I did my best to create a change that would requite almost zero engagement from relay authors - and that would really solve and not just mitigate the CPU bottleneck. I understand that some ways of mitigating this performance problems but it should be in the hands of relay authors to decide wether they should use cache it or solve it the other way. For the cache itself as currently the validator endpoint is publicly opened, if I'm not mistaken - any person can easily create an attack script that would submit a random payloads bypassing the cache "guards". And by doing this constantly you can make relay unresponsive with fairly small number of payloads using a cheap instance. And if you don't do this on submission process - as suggested above - it's impossible to make relay unresponsive because of this bottleneck. |
I think that there may be a case for asynchronous verification of validator registrations, but before this the synchronous path should be optimized to see if it is really necessary as it does add a level of complexity for validators beyond the current "if I receive a successful registration message then I know the registration was good". There is a lot that can be done with validator registrations beyond what exists in either dreamboat or the flashbots MEV relay today. Some notes on this:
And of course concurrency should be used as much as possible to parallelize operations where large numbers of registrations are received at the same time. It is also worth exploring the issue where there is a large spike of registrations around the epoch. There is, as far as I am aware, no technical reason why the registrations all need to arrive at this particular point in time (and indeed Vouch sends them roughly half way through the previous epoch). Making a change to spread the submission of validator registrations out could provide a solution in itself, and I have kicked off a discussion around this. |
Abstract
This proposal removes immediate signature verification of new validator registrations, making the verification asynchronous. The information about verification status shouldn’t be returned from the registration endpoint any more, and instead queried from data API.
This change removes a CPU bottleneck in the relay, along with possible DOS attack vectors, and allows registration process to be resilient to high loads.
With the current process, relays don’t have an even load and relay operators need to use expensive infrastructure to cover the load spikes. As the signature verification failure status is not used in the flow, relaxing the spikes should greatly reduce relay’s maintenance costs, increasing the number of people who can afford running relays.
Motivation
This change addresses a number of problems and threats to the relay ecosystem that are the effect of verifying registration signatures on register validator submission.
The mainnet right now has more than 400k validators and we expect this number to grow with Ethereum adoption. There are existing mechanisms that on every epoch re-register existing validators, resending hundreds of thousands of validator registrations in the few first seconds of an epoch.
Verifying signatures is not a computationally trivial operation - to carry the CPU load a singular relay instance needs to be run on an oversized server that is not utilizing its computing power for the rest of an epoch.
The current implementation of [mev-boost](https://github.com/flashbots/mev-boost/blob/main/server/service.go#L276-L284) does not return error contents back to the validator on failed registration.
The verification of registration signatures is also not immediately used as the registrations are only used to be returned later by the api endpoint ([link](https://github.com/flashbots/mev-boost-relay/blob/174a4a66280aa0289551f61dbabbb17ec202c18d/services/api/service.go#L1420)).
Current network traffic characteristics are similar to a DDOS attack, as the current mechanism creates an attack vector where a bad actor sends its own registration slightly ahead of time and then floods the server with incorrect registrations. It’s also possible to completely clog the relay with just the number of new registrations.
Prior Art
The recurrent registration problem was reported a few times before, but the core of the problem was never satisfactorily resolved.
#24
This change is meant to eliminate the CPU-bound performance problems, without changing the entire network’s behavior.
Detailed Description
Current Validator registration process:
This proposal aims to make the signature verification (step 4) asynchronous.
The only reason a good, lawful and honest validator may be concerned about its signature state is at the time of its initial or consecutive deployments, i.e. when configuration can change. There is no benefit to knowing it is still correct on every deployment.
Therefore, the information about the state of verification may be offloaded into a separate endpoint and removed from (POST)
/relay/v1/builder/validators
. It can be achieved by extending/relay/v1/data/validator_registration
with additional enum field -status
.Go (possible implementation)
The endpoint would then return:
The new process would assume that after successful initial verification (e.g. submission time, is known validator) every registration would be persisted with the default unverified state.
It should be left for the relay development team to decide on implementation details, however, the goal for the verification itself is to become “eventually verified”. This new flow would allow various improvements not limited to verifying signatures in the background process, throttling the number of parallel calculations, or calculating signatures only upon request.
For existing relay implementations it would still be possible to preserve verification calculations on submission and save as already verified.
Backward Compatibility
This change introduces a weak inconsistency for people who were expecting to find an error on incorrect signature - that should no longer be returned.
Change to the
/relay/v1/data/validator_registration
endpoint is additive - meaning there are no protocol-breaking changes.In existing codebases, the value can still be calculated in the same place as it was before and saved as already verified. So no big immediate changes should be needed.
Dependencies
This proposal doesn’t depend on any other work.
Risks and Security Considerations
There is no standard of the multi-validator registration process - it is unclear how relays should behave upon a failure of one validator. From the user perspective, it’s undesirable to fail all validators in the payload if one has a broken signature. This change allows all validators that passed pre-check to be registered and discarded only when needed.
Rationale and Alternatives
As described above this change targets performance improvements for good, lawful and honest validators, as signatures may only change during the deployment of a new configuration, followed by a process restart. There is no benefit to verifying that one’s signature is still correct for every request. Furthermore, current implementations of processes like mev-boost would not return this information back to the validator either.
This simple change can allow relays to use smaller servers, utilizing more of the cpu idle time - as we no longer have 3s window for verifying >400k signatures - allowing more people to afford running relays.
The text was updated successfully, but these errors were encountered: