Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Formal Support for Splitting BN and VC into Separate Processes #3088

Closed
jclapis opened this issue Nov 10, 2021 · 6 comments
Closed

[FR] Formal Support for Splitting BN and VC into Separate Processes #3088

jclapis opened this issue Nov 10, 2021 · 6 comments
Labels
VC Issues related to Validator Client

Comments

@jclapis
Copy link
Contributor

jclapis commented Nov 10, 2021

One of the useful features of the other clients is the separation of the Beacon Node and the Validator Client into separate processes. This is quite beneficial for a few reasons:

  1. It means users can preserve their validator key setups and keep a dedicated VC, but are able to move the BN around if necessary. For example, on low-power systems, sometimes (such as during a Sync Committee) the BN could become overwhelmed. It may be preferable to temporarily offload the BN to another machine with more resources available, and return to the original machine afterwards.

  2. It would allow for a single BN to connect to multiple VCs. Rocket Pool's "Hybrid Mode" allows for a user to use an externally managed BN (say, for solo staking) and connect a separate VC that Rocket Pool manages to it. This mode is not compatible with Nimbus, meaning Rocket Pool must manage the entire stack which prevents solo stakers from leveraging it.

  3. It allows users to experiment with different BN implementations. My understanding is that the BN and VC communicate via the standard Beacon REST API, and thus should be interchangeable. This would allow, for example, a Nimbus BN to attach to a Lighthouse VC (or vice versa); users could sync Nimbus in the background, and when it's ready, point their VC at it instead. This would help improve client diversity by providing an easy way to experiment with the different BNs without the risk of slashing.

  4. It would allow for BN failover to be an option; users could keep a VC running and attach multiple BN endpoints to it so that attestations are not lost in case one of the BNs goes down.

  5. When creating a new validator key, currently we have to restart the Nimbus client for it to pick up the new validator. This spin-up can take a long time. Splitting the clients would only require the VC to be restarted, which is comparatively a much faster task.

I have experimented with this split-process mode on Prater, and while it worked well enough for the testnet, I was advised that it is not considered production-ready yet. I encourage the team to look into finishing support for it because we would most certainly leverage it.

@poupas
Copy link

poupas commented Jan 10, 2022

Another reason to why separating the Beacon Node from the Validator Client is attack surface reduction. If separated from the BC, the VC could run isolated without having any open ports to the Internet.

The BC is considerably more complex than the VC, and therefore has a larger attack surface. By coupling the VC with the BC, we may be unnecessarily exposing the VC to remote attacks, and increasing the risk of leaking key material.

@arnetheduck
Copy link
Member

arnetheduck commented Jan 10, 2022

Another reason to why separating the Beacon Node from the Validator Client is attack surface reduction.

While this is often mentioned, it is generally the case that the complexity of running several processes in most cases leads to a worse total security outcome than a more simple setup with fewer moving parts, simply because the biggest threat to security is often the human factor, a lot more so than any technical factors - if we define "security" in this context as "preventing the loss of funds", this theory has a lot more backing in the real world: 100% of the slashings with known cause so far have been linked to different forms of user error related to overly complex validator client setups.

@poupas
Copy link

poupas commented Jan 10, 2022

I agree with your argument: human error is one of the main causes (maybe the main cause?) of security issues, and complexity makes it more difficult to secure infrastructure in general.

That being said, I also include programmers in the human side of the equation. And programmers do make errors. After all, many security vulnerabilities can be attributed to vulnerabilities being introduced in the code by humans :)

So, by splitting the the VC from the BC we could in fact shield users from human (developer) error. We could argue that an implementation error in the code can have far more serious consequences, by virtue of being amplified by thousand of vulnerable instances vs a couple of misconfigured ones. Isolating code with different privilege levels can help mitigate this.

We could also argue that Ethereum's choice of having two split keys (validation key and withdrawal key) leads to more complexity, and indeed it does. People might misplace keys, confuse their purposes, etc. But it also self-evident that this design removes incentives from attacking validators, and results in overall improved security.

Is there any documented real-world instance where having the VC separate from the BC has resulted in decreased security?

P.S. - I know I won't to convince you with my arguments, but I also think that the "everything running in the same process" leads to more secure software is not consensual. There are many counter-examples of secure software that has adopted the "split components into different privilege levels in order to contain damage":

  • qmail (several distinct processes with different privilege levels - and this requires configuring several daemons)
  • OpenSSH (privilege separation)
  • Browser sandboxes

@arnetheduck
Copy link
Member

Is there any documented real-world instance where having the VC separate from the BC has resulted in decreased security?

A good example is https://blog.staked.us/blog/eth2-post-mortem - this is a professional setup run by competent people, and yet failed because there was too much complexity on the VC side side without additional safety nets - it resulted in the loss of 75 validators. Again, this slightly depends on your definition of "security": if you constrain it to "a hacker gained access through technical means over the internet and did bad things", then a VC setup will naturally be more secure - if instead you mean it as "the architecture helped keep the keys online", then all slashings so far are unlikely to have happened had their owners pursued more simple setups.

I know I won't to convince you with my arguments

Oh, I don't need convincing really - I'm fully aware that under the right conditions, there exists a chance that you'll be able to create a more secure setup with a split architecture - this architecture is getting implemented in nimbus as well, as an option to our users - in particular, you can already use Nimbus as a beacon node for alternative VC implementations.

I'm merely pointing out that "keeping more validator keys online" is not one of the probable / predictable outcomes of the feature, quite the contrary: I'm quite certain more keys will be lost because of it.

You're right though that we shouldn't trust the programmers either - we have multiple safety nets in place around keys, code review, auditing and so on that also act as filters for human errors.

As an example, the VC architecture itself is quite complex: the VC does lots of things beyond pure signatures, and will be doing even more "soon" with the merge - if securing your keys is your goal, you're better off with our features for out-of-process signing - it runs a trivial "signing process" on the side that does nothing other than sign things - it's intended for hardware wallet signing and protocols like web3signer, and much more appropriate for keeping small security surface areas around private keys.

@pablomendezroyo
Copy link

Another situation where having beacon and validator processes split into separated services would be helpful is when using nimbus with the web3signer. You may want to switch the client you are using to validate at any given moment by stopping the validator service and keeping the beacon syncing.

You should be able to stop validating with nimbus without stopping the beacon from syncing

@arnetheduck
Copy link
Member

This has been implemented as of v22.11!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
VC Issues related to Validator Client
Projects
None yet
Development

No branches or pull requests

4 participants