Deploy validator clients #122

narimiran · 2022-08-03T07:40:45Z

This is a continuation of #111 ("(...) if the setup proves to be stable enough, we might deploy some validators to the consensus node.")

This can either be a new role or an expansion of a current role.
This should be done for all testnets.
We should start with a relatively small number of validators initially, e.g. on the nodes that currently run 60 validators.

This can be done in one of these two ways:

New boolean flag use_validator_client: indicating that all validators on that host should be attached to the validator client.
Alternatively: allow the hosts to have validators attached both to the beacon node and to the validator client. New property number_of_validators: the number of validators attached to a validator client.

More information about running the validator client is available here.

The text was updated successfully, but these errors were encountered:

jakubgs · 2022-08-26T11:49:24Z

I assume since you want this enabled on nodes with 60 validators that would be metal-05.he-eu-hel1.nimbus.prater:

infra-nimbus/ansible/group_vars/nimbus.prater.yml

Lines 101 to 105 in ddc8de5

    
           'metal-05.he-eu-hel1.nimbus.prater': # 60 each 
        
             - { branch: 'stable',   start: 20164, end: 20224, build_freq: '*-*-* 11:00:00' } 
        
             - { branch: 'testing',  start: 20284, end: 20344, build_freq: '*-*-* 15:00:00', nim_commit: 'version-1-6' } 
        
             - { branch: 'unstable', start: 20224, end: 20284, build_freq: '*-*-* 13:00:00', payload_builder: true, open_libp2p_ports: false } 
        
             - { branch: 'libp2p',   start: 20344, end: 20404, build_freq: '*-*-* 17:00:00', nim_commit: 'version-1-6', nim_flags: '-d:json_rpc_websocket_package=websock' }

On all branches or just some?

jakubgs · 2022-08-30T08:06:38Z

Before this issue can be resolved we first need a proper dedicated EL client node setup, as right now all beacon nodes are using the same Geth node running on goerli-01.aws-eu-central-1a.nimbus.geth.

Deploy dedicated Geth instances for all Prater Nimbus nodes #125

jakubgs · 2022-09-09T16:17:48Z

I started some work on this. Or at least I did some thinking:

I will set this up as part of the existing infra-role-beacon-node-* roles to avoid too much boilerplate.
I'm going to make use of the --secrets-dir and --validators-dir flags and point them at /var/empty or similar.
I will leave the secrets and validators in the same place as they already are, except they will be served by the client.

I will probably get a version working on Monday.

Necessary to later provide `/var/empty` as path for both in order to use validator client service instead of loading validators directly. status-im/infra-nimbus#122 Signed-off-by: Jakub Sokołowski <[email protected]>

This is necessary since the `--secrets-dir` and `--validators-dir` flags can be also provided separately to a beacon node. This also allows for setting these paths to `/var/empty` when a validator client is being used instead of providing the files to the node. status-im/infra-nimbus#122 Signed-off-by: Jakub Sokołowski <[email protected]>

jakubgs · 2022-09-12T14:58:23Z

Changes necessary to manage location of secrets and validators folders separately:

status-im/infra-role-dist-validators@6d7d9e37 - provide secrets and validators paths separately
status-im/infra-role-beacon-node-linux@3e8e553f - configure secrets and validators paths explicitly
status-im/infra-role-beacon-node-macos@b51df7bf - configure secrets and validators paths explicitly
status-im/infra-role-beacon-node-windows@ac7696da - configure secrets and validators paths explicitly
928d5a12 - requirements: specify validator paths explicitly

Now we can point those at /var/empty when using a validator client.

arnetheduck · 2022-09-12T15:00:46Z

I'm going to make use of the --secrets-dir and --validators-dir flags and point them at /var/empty or similar.
I will leave the secrets and validators in the same place as they already are, except they will be served by the client.

eh this feels risky - one restart where the flag is misspelled or some other shit reason, and the validators are gone

jakubgs · 2022-09-12T15:12:15Z

eh this feels risky - one restart where the flag is misspelled or some other shit reason, and the validators are gone

That's the point. They are supposed to be gone.

jakubgs · 2022-09-20T13:10:34Z

Based on discussion with @arnetheduck and reading of the main issue:

[FR] Formal Support for Splitting BN and VC into Separate Processes nimbus-eth2#3088

It seems it makes sense to create a separate Ansible role makes the most sense, since we might want to do 1-N setups.

jakubgs · 2022-09-21T17:32:06Z

I've created a repo for the separate ansible role: https://github.com/status-im/infra-repos/commit/764a8225

https://github.com/status-im/infra-role-validator-client

I will be basing it mostly on infra-role-beacon-node-linux.

jakubgs · 2022-09-21T20:24:24Z

Some initial work:

status-im/infra-role-validator-client@14939d96 - add basic Ansible metadata
status-im/infra-role-validator-client@7a7eb52f - add initial setup of folders and buinary build

jakubgs · 2022-09-22T10:09:44Z

More changes to get this going:

status-im/infra-role-validator-client@d22e685f - add task for configuring systemd service
status-im/infra-role-validator-client@1a049ecb - add task for deploying validators
status-im/infra-role-validator-client@ebf86bf9 - add task for opening ports on firewall
status-im/infra-role-validator-client@cc48f725 - add task for configuring consul services
status-im/infra-role-validator-client@eb681bc1 - enable keymanager endpoint by default
status-im/infra-role-validator-client@87f85213 - fix validator_client_build_repo_branch variable
status-im/infra-role-validator-client@334ac355 - drop unsupported --network flag
status-im/infra-role-validator-client@4200da56 - user: fix creating sudoers files
status-im/infra-role-validator-client@dedbdcaa - build: fix repo path variable name
status-im/infra-role-validator-client@73e3e4c0 - service: drop unnecessary --era-dir flag
status-im/infra-role-validator-client@763972bb - config: create keymanager token file and use it

jakubgs · 2022-09-23T08:31:09Z

I started by deploying changes to nimbus.ropsten host, but this is interesting:

{"lvl":"NOT","ts":"2022-09-23 08:28:15.955+00:00","msg":"Starting REST HTTP server","url":"http://127.0.0.1:5053/"}
{"lvl":"INF","ts":"2022-09-23 08:28:15.956+00:00","msg":"Beacon node has been identified","agent":"Nimbus/v22.9.1-72e6b2-stateofus","service":"fallback_service","endpoint":"127.0.0.1:9301"}
{"lvl":"INF","ts":"2022-09-23 08:28:15.956+00:00","msg":"Beacon node has compatible configuration","service":"fallback_service","endpoint":"127.0.0.1:9301 [Nimbus/v22.9.1-72e6b2-stateofus]"}
{"lvl":"INF","ts":"2022-09-23 08:28:15.956+00:00","msg":"Beacon node is in sync","sync_distance":0,"head_slot":833241,"is_opimistic":"false","service":"fallback_service","endpoint":"127.0.0.1:9301 [Nimbus/v22.9.1-72e6b2-stateofus]"}
{"lvl":"NOT","ts":"2022-09-23 08:28:15.957+00:00","msg":"Fork schedule updated","fork_schedule":[{"previous_version":"0x80000069","current_version":"0x80000069","epoch":0},{"previous_version":"0x80000069","current_version":"0x80000070","epoch":500},{"previous_version":"0x80000070","current_version":"0x80000071","epoch":750}],"service":"fork_service"}
{"lvl":"ERR","ts":"2022-09-23 08:28:27.959+00:00","msg":"Unable to get head state's validator information","service":"duties_service"}
{"lvl":"NOT","ts":"2022-09-23 08:28:27.961+00:00","msg":"REST service started","address":"127.0.0.1:5053"}
{"lvl":"INF","ts":"2022-09-23 08:28:27.961+00:00","msg":"Scheduling first slot action","startTime":"16w3d17h28m27s961ms398us840ns","nextSlot":833243,"timeToNextSlot":"8s38ms601us160ns"}
{"lvl":"INF","ts":"2022-09-23 08:28:27.962+00:00","msg":"Beacon node has been identified","agent":"Nimbus/v22.9.1-72e6b2-stateofus","service":"fallback_service","endpoint":"127.0.0.1:9301 [Nimbus/v22.9.1-72e6b2-stateofus]"}
{"lvl":"INF","ts":"2022-09-23 08:28:27.963+00:00","msg":"Beacon node has compatible configuration","service":"fallback_service","endpoint":"127.0.0.1:9301 [Nimbus/v22.9.1-72e6b2-stateofus]"}
{"lvl":"INF","ts":"2022-09-23 08:28:27.964+00:00","msg":"Beacon node is in sync","sync_distance":1,"head_slot":833241,"is_opimistic":"false","service":"fallback_service","endpoint":"127.0.0.1:9301 [Nimbus/v22.9.1-72e6b2-stateofus]"}
{"lvl":"WRN","ts":"2022-09-23 08:28:29.985+00:00","msg":"Connection with beacon node(s) has been lost","online_nodes":0,"unusable_nodes":1,"total_nodes":1,"service":"fallback_service"}
{"lvl":"WRN","ts":"2022-09-23 08:28:31.970+00:00","msg":"No suitable beacon nodes available","online_nodes":0,"offline_nodes":1,"uninitalized_nodes":0,"incompatible_nodes":0,"nonsynced_nodes":0,"total_nodes":1,"service":"fallback_service"}

It seems to be failing in a loop of:

Beacon node has been identified
Beacon node has compatible configuration
Beacon node is in sync
Connection with beacon node(s) has been lost
No suitable beacon nodes available

Over and over again. @narimiran any idea?

jakubgs · 2022-09-23T08:38:14Z

I can also see that our usual healthcheck for Consul isn't there in Keymanager API:

[email protected]:~ % c http://localhost:5053/eth/v1/node/version
curl: (22) The requested URL returned error: 404 Not Found

So for now I'm going to use a TCP healthcheck, but it would be nice to have a route without auth for this.

API: https://ethereum.github.io/keymanager-APIs/#/Remote%20Key%20Manager

Necessary due to large size of headers whenn validator-client has a large number of validators attached. status-im/infra-nimbus#122 Signed-off-by: Jakub Sokołowski <[email protected]>

jakubgs · 2022-09-23T12:00:52Z

Some more changes:

status-im/infra-role-validator-client@4200da56 - user: fix creating sudoers files
status-im/infra-role-validator-client@dedbdcaa - build: fix repo path variable name
status-im/infra-role-validator-client@73e3e4c0 - service: drop unnecessary --era-dir flag
status-im/infra-role-validator-client@763972bb - config: create keymanager token file and use it
status-im/infra-role-validator-client@928073b7 - consul: use tcp healthcheck for keymanager API

And a fix for massive headers sent to the beacon node:

status-im/infra-role-beacon-node-linux@f04d6a46 - service: specify reset max body and max headers size

Which was causing this:

{
  "lvl": "ERR",
  "ts": "2022-09-23 11:12:24.002+00:00",
  "msg": "Unable to get head state's validator information",
  "service": "duties_service"
}

#122 Signed-off-by: Jakub Sokołowski <[email protected]>

jakubgs · 2022-09-23T12:02:48Z

And here's the setup on nimbus.ropsten: 7d05abad

We want to test with lower numbers of validators first. #122 Signed-off-by: Jakub Sokołowski <[email protected]>

jakubgs · 2022-09-23T14:23:46Z

Lowered number of validators for VC nodes as requested by @cheatfate:

23c07e3d - nimbus.ropsten: lower geth memory limits
89f04d8a - nimbus.ropsten: use less validators on VC nodes

infra-nimbus/ansible/group_vars/nimbus.ropsten.yml

Lines 83 to 87 in b1760d9

    
           'metal-01.he-eu-hel1.nimbus.ropsten': 
        
             - { start:     0, end:   500, validator_client: true  } # 500 
        
             - { start:   500, end:  1500, validator_client: true  } # 1000 
        
             - { start:  1500, end:  3500, validator_client: false } # 2000 
        
             - { start:  3500, end: 10000, validator_client: false } # 6500

It seems to be hogging far too much memory. #122 Signed-off-by: Jakub Sokołowski <[email protected]>

We want to test with lower numbers of validators first. #122 Signed-off-by: Jakub Sokołowski <[email protected]>

jakubgs · 2022-09-26T14:21:58Z

I forgot to note o friday that the beacon nodes with validator clients connected are using obscene amounts of memory:

The ones without validator clients are fine though.

For now only for the first node. #122 Signed-off-by: Jakub Sokołowski <[email protected]>

jakubgs · 2022-09-28T00:23:05Z

I also deployed validator client for the first node on the Sepolia host: 7da6edf1

infra-nimbus/ansible/group_vars/nimbus.sepolia.yml

Lines 76 to 80 in 7da6edf

    
           'linux-01.he-eu-hel1.nimbus.sepolia': 
        
             - { start:  0, end:  25, validator_client: true } 
        
             - { start: 25, end:  50, validator_client: false, nim_commit: 'version-1-6', payload_builder: true  } 
        
             - { start: 50, end:  75, validator_client: false, nim_commit: 'version-1-6' } 
        
             - { start: 75, end: 100, validator_client: false, nim_flags: '-d:json_rpc_websocket_package=websock' }

And also implemented disabling service and its checks:

status-im/infra-role-consul-service@4849a3c9 - add disabled attribute to check definition
status-im/infra-role-validator-client@4acb2262 - service: allow disabling the service by default

For now only for the first node. #122 Signed-off-by: Jakub Sokołowski <[email protected]>

jakubgs · 2022-09-28T08:17:10Z

And interestingly enough on Sepolia no such memory issues appear:

Although that might be a function of number of validators attached. Or maybe network-specific.

#122 Signed-off-by: Jakub Sokołowski <[email protected]>

jakubgs · 2022-10-11T12:22:08Z

I also deployed validator client for the stable node on linux-04 host on prater: 269a76a2

infra-nimbus/ansible/group_vars/nimbus.prater.yml

Lines 147 to 151 in 269a76a

    
           'linux-04.he-eu-hel1.nimbus.prater': # 30 each 
        
             - { branch: 'stable',   start: 20044, end: 20074, build_freq: '*-*-* 11:00:00', validator_client: true } 
        
             - { branch: 'testing',  start: 20104, end: 20134, build_freq: '*-*-* 15:00:00', nim_commit: 'version-1-6' } 
        
             - { branch: 'unstable', start: 20074, end: 20104, build_freq: '*-*-* 13:00:00', payload_builder: true } 
        
             - { branch: 'libp2p',   start: 20134, end: 20164, build_freq: '*-*-* 17:00:00', nim_flags: '-d:json_rpc_websocket_package=websock' }

And it appears to run fine and without memory issues so far, but the doppelganger detection is taking an awful long time:

[email protected]:~ % grep 'Attestation has not been served' /data/validator-client-prater-stable-01/logs/service.log | head -n1
{"lvl":"INF","ts":"2022-10-11 11:36:28.002+00:00","msg":"Attestation has not been served (doppelganger check still active)","slot":4081682,"validator":"94cab382","validator_index":307765,"service":"attestation_service"}

[email protected]:~ % grep 'Attestation has not been served' /data/validator-client-prater-stable-01/logs/service.log | tail -n1
{"lvl":"INF","ts":"2022-10-11 12:20:04.002+00:00","msg":"Attestation has not been served (doppelganger check still active)","slot":4081900,"validator":"94cad9d2","validator_index":310801,"service":"attestation_service"}

Already 45 minues and still going. Appears to be a bug.

#122 Signed-off-by: Jakub Sokołowski <[email protected]>

jakubgs · 2022-10-11T12:25:07Z

For now I've disabled doppelganger for the validator temporarily: 5de20671https://github.com/status-im/infra-nimbus/blob/5de206719e12e7b3e29364cc8474f46625a7cb1e/ansible/group_vars/nimbus.prater.yml#L85
Once the fix is merged I'll undo that.

jakubgs · 2022-10-11T12:32:52Z

Looks like we are in business:

[email protected]:~ % grep 'Attestation published' /data/validator-client-prater-stable-01/logs/service.log | tail -n4
{"lvl":"NOT","ts":"2022-10-11 12:28:16.005+00:00","msg":"Attestation published","attestation":{"aggregation_bits":"0x000000000000000000000000000000000000000000080020","data":{"slot":4081941,"index":62,"beacon_block_root":"84c6728e","source":"127559:5e90d514","target":"127560:2208b6af"},"signature":"abef638f"},"validator":"94cad9d2","validator_index":310801,"delay":"5ms15us884ns","service":"attestation_service"}
{"lvl":"NOT","ts":"2022-10-11 12:28:28.057+00:00","msg":"Attestation published","attestation":{"aggregation_bits":"0x000000000000000800000000000000000000000000000020","data":{"slot":4081942,"index":43,"beacon_block_root":"97f79d1a","source":"127559:5e90d514","target":"127560:2208b6af"},"signature":"b52d9f94"},"validator":"94cc88e5","validator_index":306447,"delay":"57ms822us424ns","service":"attestation_service"}
{"lvl":"NOT","ts":"2022-10-11 12:29:52.019+00:00","msg":"Attestation published","attestation":{"aggregation_bits":"0x000000000000000000000000000000000000000040000020","data":{"slot":4081949,"index":29,"beacon_block_root":"665b69cd","source":"127559:5e90d514","target":"127560:2208b6af"},"signature":"a3764bad"},"validator":"94cab382","validator_index":307765,"delay":"19ms516us82ns","service":"attestation_service"}
{"lvl":"NOT","ts":"2022-10-11 12:31:04.006+00:00","msg":"Attestation published","attestation":{"aggregation_bits":"0x004000000000000000000000000000000000000000000020","data":{"slot":4081955,"index":60,"beacon_block_root":"39e0176f","source":"127560:2208b6af","target":"127561:79b5ab0a"},"signature":"ad20a7b6"},"validator":"94c79ff7","validator_index":304582,"delay":"6ms122us158ns","service":"attestation_service"}

But one things that makes me wonder is why the validator client message is Attestation published while the beacon node message is Attestation sent. Seems like an unnecessary divergence that can just cause confusion.

jakubgs · 2022-10-12T07:34:12Z

This is currently working on all 3 testnets: sepolia, ropsten, and prater

infra-nimbus/ansible/group_vars/nimbus.sepolia.yml

Lines 76 to 77 in c8e3232

    
           'linux-01.he-eu-hel1.nimbus.sepolia': 
        
             - { start:  0, end:  25, validator_client: true }

infra-nimbus/ansible/group_vars/nimbus.ropsten.yml

Lines 82 to 84 in c8e3232

    
           'metal-01.he-eu-hel1.nimbus.ropsten': 
        
             - { start:     0, end:   500, validator_client: true  } # 500 
        
             - { start:   500, end:  1500, validator_client: true  } # 1000

infra-nimbus/ansible/group_vars/nimbus.prater.yml

Lines 148 to 149 in c8e3232

    
           'linux-04.he-eu-hel1.nimbus.prater': # 30 each 
        
             - { branch: 'stable',   start: 20044, end: 20074, build_freq: '*-*-* 11:00:00', validator_client: true }

I consider this done. Reopen if there's something missing.

#122 Signed-off-by: Jakub Sokołowski <[email protected]>

jakubgs · 2022-10-24T16:25:06Z

Based on suggestion from @arnetheduck I've also deployed a VC for unstable on linux-03: cf8bab14

jakubgs self-assigned this Aug 26, 2022

jakubgs added a commit that referenced this issue Sep 23, 2022

nimbus.ropsten: initial setup of validator clients

7d05aba

#122 Signed-off-by: Jakub Sokołowski <[email protected]>

jakubgs added a commit that referenced this issue Sep 23, 2022

nimbus.ropsten: use less validators on VC nodes

b1760d9

We want to test with lower numbers of validators first. #122 Signed-off-by: Jakub Sokołowski <[email protected]>

jakubgs added a commit that referenced this issue Sep 23, 2022

nimbus.ropsten: lower geth memory limits

23c07e3

It seems to be hogging far too much memory. #122 Signed-off-by: Jakub Sokołowski <[email protected]>

jakubgs added a commit that referenced this issue Sep 23, 2022

nimbus.ropsten: use less validators on VC nodes

89f04d8

We want to test with lower numbers of validators first. #122 Signed-off-by: Jakub Sokołowski <[email protected]>

jakubgs added a commit that referenced this issue Sep 28, 2022

nimbus.sepolia: add validator client service

70f3da7

For now only for the first node. #122 Signed-off-by: Jakub Sokołowski <[email protected]>

jakubgs added a commit that referenced this issue Sep 28, 2022

nimbus.sepolia: add validator client service

7da6edf

For now only for the first node. #122 Signed-off-by: Jakub Sokołowski <[email protected]>

jakubgs added a commit that referenced this issue Oct 11, 2022

nimbus.prater: deploy validator client to linux-04

269a76a

#122 Signed-off-by: Jakub Sokołowski <[email protected]>

jakubgs added a commit that referenced this issue Oct 11, 2022

nimbus.prater: temporary fix for buggy doppelganger

5de2067

#122 Signed-off-by: Jakub Sokołowski <[email protected]>

jakubgs closed this as completed Oct 12, 2022

jakubgs added a commit that referenced this issue Oct 24, 2022

nimbus.prater: use VC for unstable node on linux-03

cf8bab1

#122 Signed-off-by: Jakub Sokołowski <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploy validator clients #122

Deploy validator clients #122

narimiran commented Aug 3, 2022

jakubgs commented Aug 26, 2022

jakubgs commented Aug 30, 2022

jakubgs commented Sep 9, 2022

jakubgs commented Sep 12, 2022

arnetheduck commented Sep 12, 2022 •

edited by jakubgs

Loading

jakubgs commented Sep 12, 2022

jakubgs commented Sep 20, 2022

jakubgs commented Sep 21, 2022

jakubgs commented Sep 21, 2022

jakubgs commented Sep 22, 2022 •

edited

Loading

jakubgs commented Sep 23, 2022 •

edited

Loading

jakubgs commented Sep 23, 2022 •

edited

Loading

jakubgs commented Sep 23, 2022

jakubgs commented Sep 23, 2022

jakubgs commented Sep 23, 2022 •

edited

Loading

jakubgs commented Sep 26, 2022

jakubgs commented Sep 28, 2022 •

edited

Loading

jakubgs commented Sep 28, 2022

jakubgs commented Oct 11, 2022 •

edited

Loading

jakubgs commented Oct 11, 2022

jakubgs commented Oct 11, 2022

jakubgs commented Oct 12, 2022

jakubgs commented Oct 24, 2022

Deploy validator clients #122

Deploy validator clients #122

Comments

narimiran commented Aug 3, 2022

jakubgs commented Aug 26, 2022

jakubgs commented Aug 30, 2022

jakubgs commented Sep 9, 2022

jakubgs commented Sep 12, 2022

arnetheduck commented Sep 12, 2022 • edited by jakubgs Loading

jakubgs commented Sep 12, 2022

jakubgs commented Sep 20, 2022

jakubgs commented Sep 21, 2022

jakubgs commented Sep 21, 2022

jakubgs commented Sep 22, 2022 • edited Loading

jakubgs commented Sep 23, 2022 • edited Loading

jakubgs commented Sep 23, 2022 • edited Loading

jakubgs commented Sep 23, 2022

jakubgs commented Sep 23, 2022

jakubgs commented Sep 23, 2022 • edited Loading

jakubgs commented Sep 26, 2022

jakubgs commented Sep 28, 2022 • edited Loading

jakubgs commented Sep 28, 2022

jakubgs commented Oct 11, 2022 • edited Loading

jakubgs commented Oct 11, 2022

jakubgs commented Oct 11, 2022

jakubgs commented Oct 12, 2022

jakubgs commented Oct 24, 2022

arnetheduck commented Sep 12, 2022 •

edited by jakubgs

Loading

jakubgs commented Sep 22, 2022 •

edited

Loading

jakubgs commented Sep 23, 2022 •

edited

Loading

jakubgs commented Sep 23, 2022 •

edited

Loading

jakubgs commented Sep 23, 2022 •

edited

Loading

jakubgs commented Sep 28, 2022 •

edited

Loading

jakubgs commented Oct 11, 2022 •

edited

Loading