-
-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restart aware doppelganger protection #5856
Comments
Yes it's a nice strategy, I think it does not compromise the safety of DG on restarts |
I looked into finishing the implementation I started in unstable...nflaig/zero-downtime-doppelganger but the issue is that making
It makes the whole validator initialization async because we have to make sure that signers are added before executing any other code like registering validator with builder / beacon node. lodestar/packages/validator/src/validator.ts Lines 122 to 123 in 12f9e11
It is also critical that the validator is registered as doppelganger before any other code can use the pubkey (e.g. to fetch duties) as this would impend the security of the doppelganger protection. I don't like making the validator store initialization async but on the other hand we are using this pattern all over the place for other modules. The validator initilization itself is already async anyways so it might be ok to introduce this change to make this feature possible. lodestar/packages/validator/src/validator.ts Line 193 in a11427e
|
Problem description
Current implementation of doppelganger protection does not differentiate between a simple restart and starting the validator on a new machine. This causes 2-3 missed attestations after every restart such as when upgrading Lodestar to a newer version but DP does not provide any protection in this case.
For some users (e.g. on dappnode), DP is enabled by default and every package upgrade introduces unnecessary downtime which causes bad user experience.
Solution description
We should make our DP implementation a bit smarter and be able to differentiate between a restart and a new installation. Validator duties should only be skipped if it is really required.
This can be achieved by looking at the local slashing protection DB and if the validator produced an attestation in the previous epoch it means that the validator client was just restarted and DP can be skipped for this validator.
It is even safer in this case to start attestations right away because if two validator clients are started at the same time and one of them has an existing DB it would prevent slashing, whereas if both wait for 2-3 epochs, no chain activity would be detected and both instances would start attesting at the same time, causing validators to be slashed if there are conflicting attestations.
Further considerations
There is an edge case when importing a validator via keymanager API, it is possible to also import the slashing protection interchange file. This will create an attestation record in the database and could cause doppelganger protection to be skipped if import happens within in the next epoch.
I don't see this as a huge risk because exporting the slashing protection interchange from a running client is only possible while also deleting the key from that client, there is no risk of getting slashed due to a doppelganger.
It also assumes slashing protection was exported / imported within short time period, likely in an automated setup which aims to achieve no downtime when switching clients and therefore doppelganger protection would not be enabled anyways.
Considering this, I think there is no risk in the suggested approach and it is safe to skip doppelganger in any scenario if an attestation from previous epoch exists in the slashing protection db.
Additional context
Similar approach is described in prysmaticlabs/prysm#7985 (comment) and status-im/nimbus-eth2#3083.
The text was updated successfully, but these errors were encountered: