Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart aware doppelganger protection #5856

Closed
nflaig opened this issue Aug 7, 2023 · 2 comments · Fixed by #6012
Closed

Restart aware doppelganger protection #5856

nflaig opened this issue Aug 7, 2023 · 2 comments · Fixed by #6012
Labels
meta-feature-request Issues to track feature requests.

Comments

@nflaig
Copy link
Member

nflaig commented Aug 7, 2023

Problem description

Current implementation of doppelganger protection does not differentiate between a simple restart and starting the validator on a new machine. This causes 2-3 missed attestations after every restart such as when upgrading Lodestar to a newer version but DP does not provide any protection in this case.

For some users (e.g. on dappnode), DP is enabled by default and every package upgrade introduces unnecessary downtime which causes bad user experience.

Solution description

We should make our DP implementation a bit smarter and be able to differentiate between a restart and a new installation. Validator duties should only be skipped if it is really required.

This can be achieved by looking at the local slashing protection DB and if the validator produced an attestation in the previous epoch it means that the validator client was just restarted and DP can be skipped for this validator.

It is even safer in this case to start attestations right away because if two validator clients are started at the same time and one of them has an existing DB it would prevent slashing, whereas if both wait for 2-3 epochs, no chain activity would be detected and both instances would start attesting at the same time, causing validators to be slashed if there are conflicting attestations.

Further considerations

There is an edge case when importing a validator via keymanager API, it is possible to also import the slashing protection interchange file. This will create an attestation record in the database and could cause doppelganger protection to be skipped if import happens within in the next epoch.

I don't see this as a huge risk because exporting the slashing protection interchange from a running client is only possible while also deleting the key from that client, there is no risk of getting slashed due to a doppelganger.
It also assumes slashing protection was exported / imported within short time period, likely in an automated setup which aims to achieve no downtime when switching clients and therefore doppelganger protection would not be enabled anyways.

Considering this, I think there is no risk in the suggested approach and it is safe to skip doppelganger in any scenario if an attestation from previous epoch exists in the slashing protection db.

Additional context

Similar approach is described in prysmaticlabs/prysm#7985 (comment) and status-im/nimbus-eth2#3083.

@dapplion
Copy link
Contributor

dapplion commented Aug 8, 2023

Yes it's a nice strategy, I think it does not compromise the safety of DG on restarts

@nflaig
Copy link
Member Author

nflaig commented Sep 26, 2023

I looked into finishing the implementation I started in unstable...nflaig/zero-downtime-doppelganger but the issue is that making addSigner in validator store async due to registerValidator in doppelganger serivce becoming async is a bit problematic. This is required because we have to load data from the slashing protection db.

async addSigner(signer: Signer, valProposerConfig?: ValidatorProposerConfig): Promise<void> {

async registerValidator(pubkeyHex: PubkeyHex): Promise<void> {

It makes the whole validator initialization async because we have to make sure that signers are added before executing any other code like registering validator with builder / beacon node.

pollPrepareBeaconProposer(config, loggerVc, api, clock, validatorStore, metrics);
pollBuilderValidatorRegistration(config, loggerVc, api, clock, validatorStore, metrics);

It is also critical that the validator is registered as doppelganger before any other code can use the pubkey (e.g. to fetch duties) as this would impend the security of the doppelganger protection.

I don't like making the validator store initialization async but on the other hand we are using this pattern all over the place for other modules. The validator initilization itself is already async anyways so it might be ok to introduce this change to make this feature possible.

static async initializeFromBeaconNode(opts: ValidatorOptions, metrics?: Metrics | null): Promise<Validator> {

@nflaig nflaig changed the title Zero downtime doppelganger protection Restart aware doppelganger protection Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta-feature-request Issues to track feature requests.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants