-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deploy validator clients #122
Comments
I assume since you want this enabled on nodes with 60 validators that would be infra-nimbus/ansible/group_vars/nimbus.prater.yml Lines 101 to 105 in ddc8de5
On all branches or just some? |
Before this issue can be resolved we first need a proper dedicated EL client node setup, as right now all beacon nodes are using the same Geth node running on |
I started some work on this. Or at least I did some thinking:
I will probably get a version working on Monday. |
Necessary to later provide `/var/empty` as path for both in order to use validator client service instead of loading validators directly. status-im/infra-nimbus#122 Signed-off-by: Jakub Sokołowski <[email protected]>
Necessary to later provide `/var/empty` as path for both in order to use validator client service instead of loading validators directly. status-im/infra-nimbus#122 Signed-off-by: Jakub Sokołowski <[email protected]>
Necessary to later provide `/var/empty` as path for both in order to use validator client service instead of loading validators directly. status-im/infra-nimbus#122 Signed-off-by: Jakub Sokołowski <[email protected]>
This is necessary since the `--secrets-dir` and `--validators-dir` flags can be also provided separately to a beacon node. This also allows for setting these paths to `/var/empty` when a validator client is being used instead of providing the files to the node. status-im/infra-nimbus#122 Signed-off-by: Jakub Sokołowski <[email protected]>
Changes necessary to manage location of
Now we can point those at |
eh this feels risky - one restart where the flag is misspelled or some other shit reason, and the validators are gone |
That's the point. They are supposed to be gone. |
Based on discussion with @arnetheduck and reading of the main issue: It seems it makes sense to create a separate Ansible role makes the most sense, since we might want to do 1-N setups. |
I've created a repo for the separate ansible role: https://github.com/status-im/infra-repos/commit/764a8225 https://github.com/status-im/infra-role-validator-client I will be basing it mostly on |
Some initial work:
|
I started by deploying changes to {"lvl":"NOT","ts":"2022-09-23 08:28:15.955+00:00","msg":"Starting REST HTTP server","url":"http://127.0.0.1:5053/"}
{"lvl":"INF","ts":"2022-09-23 08:28:15.956+00:00","msg":"Beacon node has been identified","agent":"Nimbus/v22.9.1-72e6b2-stateofus","service":"fallback_service","endpoint":"127.0.0.1:9301"}
{"lvl":"INF","ts":"2022-09-23 08:28:15.956+00:00","msg":"Beacon node has compatible configuration","service":"fallback_service","endpoint":"127.0.0.1:9301 [Nimbus/v22.9.1-72e6b2-stateofus]"}
{"lvl":"INF","ts":"2022-09-23 08:28:15.956+00:00","msg":"Beacon node is in sync","sync_distance":0,"head_slot":833241,"is_opimistic":"false","service":"fallback_service","endpoint":"127.0.0.1:9301 [Nimbus/v22.9.1-72e6b2-stateofus]"}
{"lvl":"NOT","ts":"2022-09-23 08:28:15.957+00:00","msg":"Fork schedule updated","fork_schedule":[{"previous_version":"0x80000069","current_version":"0x80000069","epoch":0},{"previous_version":"0x80000069","current_version":"0x80000070","epoch":500},{"previous_version":"0x80000070","current_version":"0x80000071","epoch":750}],"service":"fork_service"}
{"lvl":"ERR","ts":"2022-09-23 08:28:27.959+00:00","msg":"Unable to get head state's validator information","service":"duties_service"}
{"lvl":"NOT","ts":"2022-09-23 08:28:27.961+00:00","msg":"REST service started","address":"127.0.0.1:5053"}
{"lvl":"INF","ts":"2022-09-23 08:28:27.961+00:00","msg":"Scheduling first slot action","startTime":"16w3d17h28m27s961ms398us840ns","nextSlot":833243,"timeToNextSlot":"8s38ms601us160ns"}
{"lvl":"INF","ts":"2022-09-23 08:28:27.962+00:00","msg":"Beacon node has been identified","agent":"Nimbus/v22.9.1-72e6b2-stateofus","service":"fallback_service","endpoint":"127.0.0.1:9301 [Nimbus/v22.9.1-72e6b2-stateofus]"}
{"lvl":"INF","ts":"2022-09-23 08:28:27.963+00:00","msg":"Beacon node has compatible configuration","service":"fallback_service","endpoint":"127.0.0.1:9301 [Nimbus/v22.9.1-72e6b2-stateofus]"}
{"lvl":"INF","ts":"2022-09-23 08:28:27.964+00:00","msg":"Beacon node is in sync","sync_distance":1,"head_slot":833241,"is_opimistic":"false","service":"fallback_service","endpoint":"127.0.0.1:9301 [Nimbus/v22.9.1-72e6b2-stateofus]"}
{"lvl":"WRN","ts":"2022-09-23 08:28:29.985+00:00","msg":"Connection with beacon node(s) has been lost","online_nodes":0,"unusable_nodes":1,"total_nodes":1,"service":"fallback_service"}
{"lvl":"WRN","ts":"2022-09-23 08:28:31.970+00:00","msg":"No suitable beacon nodes available","online_nodes":0,"offline_nodes":1,"uninitalized_nodes":0,"incompatible_nodes":0,"nonsynced_nodes":0,"total_nodes":1,"service":"fallback_service"} It seems to be failing in a loop of:
Over and over again. @narimiran any idea? |
I can also see that our usual healthcheck for Consul isn't there in Keymanager API:
So for now I'm going to use a TCP healthcheck, but it would be nice to have a route without auth for this. API: https://ethereum.github.io/keymanager-APIs/#/Remote%20Key%20Manager |
Necessary due to large size of headers whenn validator-client has a large number of validators attached. status-im/infra-nimbus#122 Signed-off-by: Jakub Sokołowski <[email protected]>
Some more changes:
And a fix for massive headers sent to the beacon node:
Which was causing this: {
"lvl": "ERR",
"ts": "2022-09-23 11:12:24.002+00:00",
"msg": "Unable to get head state's validator information",
"service": "duties_service"
} |
#122 Signed-off-by: Jakub Sokołowski <[email protected]>
And here's the setup on |
We want to test with lower numbers of validators first. #122 Signed-off-by: Jakub Sokołowski <[email protected]>
Lowered number of validators for VC nodes as requested by @cheatfate:
infra-nimbus/ansible/group_vars/nimbus.ropsten.yml Lines 83 to 87 in b1760d9
|
It seems to be hogging far too much memory. #122 Signed-off-by: Jakub Sokołowski <[email protected]>
We want to test with lower numbers of validators first. #122 Signed-off-by: Jakub Sokołowski <[email protected]>
For now only for the first node. #122 Signed-off-by: Jakub Sokołowski <[email protected]>
I also deployed validator client for the first node on the Sepolia host: 7da6edf1 infra-nimbus/ansible/group_vars/nimbus.sepolia.yml Lines 76 to 80 in 7da6edf
And also implemented disabling service and its checks:
|
For now only for the first node. #122 Signed-off-by: Jakub Sokołowski <[email protected]>
#122 Signed-off-by: Jakub Sokołowski <[email protected]>
I also deployed validator client for the infra-nimbus/ansible/group_vars/nimbus.prater.yml Lines 147 to 151 in 269a76a
And it appears to run fine and without memory issues so far, but the doppelganger detection is taking an awful long time:
Already 45 minues and still going. Appears to be a bug. |
#122 Signed-off-by: Jakub Sokołowski <[email protected]>
For now I've disabled doppelganger for the validator temporarily: 5de20671https://github.com/status-im/infra-nimbus/blob/5de206719e12e7b3e29364cc8474f46625a7cb1e/ansible/group_vars/nimbus.prater.yml#L85 |
Looks like we are in business:
But one things that makes me wonder is why the validator client message is |
This is currently working on all 3 testnets: infra-nimbus/ansible/group_vars/nimbus.sepolia.yml Lines 76 to 77 in c8e3232
infra-nimbus/ansible/group_vars/nimbus.ropsten.yml Lines 82 to 84 in c8e3232
infra-nimbus/ansible/group_vars/nimbus.prater.yml Lines 148 to 149 in c8e3232
I consider this done. Reopen if there's something missing. |
#122 Signed-off-by: Jakub Sokołowski <[email protected]>
Based on suggestion from @arnetheduck I've also deployed a VC for |
This is a continuation of #111 ("(...) if the setup proves to be stable enough, we might deploy some validators to the consensus node.")
This can either be a new role or an expansion of a current role.
This should be done for all testnets.
We should start with a relatively small number of validators initially, e.g. on the nodes that currently run 60 validators.
This can be done in one of these two ways:
use_validator_client
: indicating that all validators on that host should be attached to the validator client.number_of_validators
: the number of validators attached to a validator client.More information about running the validator client is available here.
The text was updated successfully, but these errors were encountered: