Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7707 validator slashing protection #7826

Closed
wants to merge 19 commits into from

Conversation

mehdi-aouadi
Copy link
Contributor

@mehdi-aouadi mehdi-aouadi commented Dec 13, 2023

PR Description

Add --shut-down-when-validator-slashed-enabled option to enable Teku shutdown when a validator gets slashed.
When using a remote BN, this option requires the BN to support the new attester_slashing and proposer_slashing SSE events.
When running VC only, in order to ensure BN backward compatibility, the VC will subscribe to the attester_slashing and proposer_slashing SSE events only when the --shut-down-when-validator-slashed-enabled is enabled (in addition to the head events) otherwise it will only subscribe to the head events.

Fixed Issue(s)

fixes #7707

Documentation

  • I thought about documentation and added the doc-change-required label to this PR if updates are required.

Changelog

  • I thought about adding a changelog entry, and added one if I deemed necessary.

@mehdi-aouadi mehdi-aouadi self-assigned this Dec 13, 2023
@mehdi-aouadi mehdi-aouadi added the doc-change-required Indicates an issue or PR that requires doc to be updated label Dec 13, 2023
@mehdi-aouadi mehdi-aouadi force-pushed the 7707-vc-slashing branch 2 times, most recently from 3394dbb to df5c7fc Compare December 14, 2023 20:58
@mehdi-aouadi mehdi-aouadi marked this pull request as ready for review December 14, 2023 20:58
@mehdi-aouadi
Copy link
Contributor Author

When the VC is used with a remote BN that doesn't support the attester_slashing and the proposer_slashing events the SSE connection will fail (and retry with the failover BN if any). I'm trying to customise the retry strategy in EventSourceBeaconChainEventAdapter::createEventSource to exclude the attester_slashing and proposer_slashing topic from the query params when the request fails but didn't find a way to do it yet. Not sure if that's even possible

Copy link
Contributor

@zilm13 zilm13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just nit comments

@mehdi-aouadi mehdi-aouadi changed the title 7707 vc slashing 7707 validator slashing protection Dec 20, 2023
@rolfyone
Copy link
Contributor

im not sure where to test so general observation...
Negative testing -
eg. a slashing occurs which is not a validator we're a custodian of - should not shut us down :)

This would be hard to replicate on a network, slashings aren't super common so we might end up not seeing it, but we'd be easily able to see we don't trigger if a slashing we don't care about arrives at a low level i assume...

Copy link
Contributor

@zilm13 zilm13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Let's rename ValidatorIndexProvider -> OwnedValidatorIndexProvider for clarity
  • Please also please open issue for acceptance test or you could just post draft PR

@mehdi-aouadi mehdi-aouadi requested a review from zilm13 December 21, 2023 16:20
Copy link
Contributor

@zilm13 zilm13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

CHANGELOG.md Outdated Show resolved Hide resolved
@rolfyone
Copy link
Contributor

only minor things - may be worth a demo to show it working...

@mehdi-aouadi
Copy link
Contributor Author

@rolfyone and @zilm13 fyi, I pushed these lasts changes: d99c2fa in order to handle the use case when the BN/VC are running in a single process and thus not using SSE. I tested both use cases and they both work (remote BN and single process VC/BN)
I had a discussion with @tbenr and we were thinking that it might be cleaner to move the slashing actions to a separate channel (other than ValidatorTimingChannel, something like ValidatorSlashingChannel) or we could leave as is...
Let me know if you have any thought otherwise I could demo the current implementation and then we merge it

CHANGELOG.md Outdated Show resolved Hide resolved
@rolfyone
Copy link
Contributor

rolfyone commented Jan 22, 2024

Let me know if you have any thought otherwise I could demo the current implementation and then we merge it

My main thought is theres a lot of renaming in this and it makes it a lot harder than it needs to be to review. this kind of cleanup should be kept outside of substantive changes, it's just good practice and probably would have made this not look like a 'big pr' that i hate so much.
I'm not sure that the feature is 'done' given the code here, and would like to see some testing around a few things, and agree with @zilm13 that an acceptance test will be important.
A demo will make me more confident without setting it all up to see exactly what's going on, and we need to make sure we're clear about what we're asking in usage, which i briefly noted.
I think given the outstanding bits, we might want to mark the cli arg as hidden, remove the change entry, and merge this part, then unhide the entry once we're actually done and at that point we can add a change entry. We definitely want to hold off merging until after the release is done.

@rolfyone rolfyone requested a review from zilm13 January 22, 2024 21:33
Copy link
Contributor

@rolfyone rolfyone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as outlined, lets come up with a plan for this feature.

@mehdi-aouadi
Copy link
Contributor Author

mehdi-aouadi commented Jan 23, 2024

Let me know if you have any thought otherwise I could demo the current implementation and then we merge it

My main thought is theres a lot of renaming in this and it makes it a lot harder than it needs to be to review. this kind of cleanup should be kept outside of substantive changes, it's just good practice and probably would have made this not look like a 'big pr' that i hate so much. I'm not sure that the feature is 'done' given the code here, and would like to see some testing around a few things, and agree with @zilm13 that an acceptance test will be important. A demo will make me more confident without setting it all up to see exactly what's going on, and we need to make sure we're clear about what we're asking in usage, which i briefly noted. I think given the outstanding bits, we might want to mark the cli arg as hidden, remove the change entry, and merge this part, then unhide the entry once we're actually done and at that point we can add a change entry. We definitely want to hold off merging until after the release is done.

Few remarks here:

  • The feature is now complete. We now support both separate VC and single process VC/BN
  • For clarity, I renamed the option to shut-down-when-validator-slashed to make it clear that we shut down whatever we're running (separate VC or a VC/BN node). This is explained in the documentation PR in more details
  • To make things clear, this feature works in both cases: separate VC (the VC is shut down) and VC/BN in a single process (the whole Teku node is shutdown). There is a documentation PR for more details since we can't put all of them in the option description. Please check Add shut-down-when-validator-slashed-enabled option doc.teku#526
  • There is only one renaming/refactoring of the SlashingRiskDetectionAction in order to reuse what has been already done for the doppelganger detection. There are existing tests for the refactored files already and that hasn't been changed
  • There is a separate PR to add an acceptance test: 7707 at vc slashing #7852
  • There isn't a detailed spec for this feature so the implemented behaviour could be changed if needed
  • I will do a demo to show how this feature works

@mehdi-aouadi mehdi-aouadi requested a review from rolfyone January 23, 2024 11:47
@zilm13
Copy link
Contributor

zilm13 commented Jan 23, 2024

I think it would be valuable to have AT tests both for separate VC and single node BN+VC as far as we found it's different cases

@rolfyone
Copy link
Contributor

  • The feature is now complete. We now support both separate VC and single process VC/BN

remarks / conversation welcome. imo this feature is yet to meet our definition of done, and we can continue on this path this time, but in future (im sure i said that last time) i have a very strong preference for not following this methodology. I'm not sure I can make that any clearer at this point.

@rolfyone
Copy link
Contributor

My suggestion:

  • hide cli arg (-X, undocumented)
  • remove changelog entry until acceptance test is ready to merge at least
  • get this tested enough to accept a merge so we can move past this huge pr, and we finish the feature when it is fully acceptance tested.

Alternatively use this as a starting point, break it down, as previously suggested.

@mehdi-aouadi
Copy link
Contributor Author

I ended up breaking this PR down to 8 different ones:
1 #7913
2 #7914
3 #7915
4 #7916
5 #7917
6 #7918
7 #7919
8 #7921

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Validator slashing protection
4 participants