[Merged by Bors] - Network protocol upgrades #2345

AgeManning · 2021-05-17T03:42:20Z

This provides a number of upgrades to gossipsub and discovery.

The updates are extensive and this needs thorough testing.

AgeManning · 2021-05-28T07:02:33Z

If this ever passes CI, its good to go

paulhauner

bors r+

This provides a number of upgrades to gossipsub and discovery. The updates are extensive and this needs thorough testing.

bors · 2021-05-28T23:29:03Z

Pull request successfully merged into unstable.

Build succeeded:

This reverts commit d12e746.

## Issue Addressed NA ## Proposed Changes Reverts #2345 in the interests of getting v1.4.0 out this week. Once we have released that, we can go back to testing this again. ## Additional Info NA

@pawanjay176

commit ea6838b Author: Paul Hauner <[email protected]> Date: Mon Jun 21 17:45:55 2021 +1000 Flip bool commit 89afc26 Author: Paul Hauner <[email protected]> Date: Mon Jun 21 15:22:48 2021 +1000 Avoid taking the write-lock on val monitor commit b84ff9f Author: realbigsean <[email protected]> Date: Fri Jun 18 05:58:01 2021 +0000 rust 1.53.0 updates (sigp#2411) ## Issue Addressed `make lint` failing on rust 1.53.0. ## Proposed Changes 1.53.0 updates ## Additional Info I haven't figure out why yet, we were now hitting the recursion limit in a few crates. So I had to add `#![recursion_limit = "256"]` in a few places Co-authored-by: realbigsean <[email protected]> Co-authored-by: Michael Sproul <[email protected]> commit 3dc1eb5 Author: Michael Sproul <[email protected]> Date: Thu Jun 17 02:10:48 2021 +0000 Ignore inactive validators in validator monitor (sigp#2396) ## Proposed Changes A user on Discord (`@ChewsMacRibs`) reported that the validator monitor was logging `WARN Attested to an incorrect head` for their validator while it was awaiting activation. This PR modifies the monitor so that it ignores inactive validators, by the logic that they are either awaiting activation, or have already exited. Either way, there's no way for an inactive validator to have their attestations included on chain, so no need for the monitor to report on them. ## Additional Info To reproduce the bug requires registering validator keys manually with `--validator-monitor-pubkeys`. I don't think the bug will present itself with `--validator-monitor-auto`. commit 98ab00c Author: Jack <[email protected]> Date: Thu Jun 17 02:10:47 2021 +0000 Handle Geth pre-EIP-155 block sync error condition (sigp#2304) ## Issue Addressed sigp#2293 ## Proposed Changes - Modify the handler for the `eth_chainId` RPC (i.e., `get_chain_id`) to explicitly match against the Geth error string returned for pre-EIP-155 synced Geth nodes - ~~Add a new helper function, `rpc_error_msg`, to aid in the above point~~ - Refactor `response_result` into `response_result_or_error` and patch reliant RPC handlers accordingly (thanks to @pawanjay176) ## Additional Info Geth, as of Pangaea Expanse (v1.10.0), returns an explicit error when it is not synced past the EIP-155 block (2675000). Previously, Geth simply returned a chain ID of 0 (which was obviously much easier to handle on Lighthouse's part). Co-authored-by: Paul Hauner <[email protected]> commit b1657a6 Author: realbigsean <[email protected]> Date: Thu Jun 17 02:10:46 2021 +0000 Reorg events (sigp#2090) ## Issue Addressed Resolves sigp#2088 ## Proposed Changes Add the `chain_reorg` SSE event topic ## Additional Info Co-authored-by: realbigsean <[email protected]> Co-authored-by: Paul Hauner <[email protected]> commit 3261eff Author: divma <[email protected]> Date: Thu Jun 17 00:40:16 2021 +0000 split outbound and inbound codecs encoded types (sigp#2410) Splits the inbound and outbound requests, for maintainability. commit a526145 Author: Clifton King <[email protected]> Date: Wed Jun 16 10:42:55 2021 +0000 Fix remote signer test (sigp#2400) ## Proposed Changes Unescape text for json comparison in: https://github.com/sigp/lighthouse/blob/3a24ca5f14c6e6d6612fd43eca82aa0c1e6aba16/remote_signer/tests/sign.rs#L282-L285 Which causes this error: ``` ---- sign::invalid_field_fork stdout ---- thread 'sign::invalid_field_fork' panicked at 'assertion failed: `(left == right)` left: `"Unable to parse body message from JSON: Error(\"invalid hex (InvalidHexCharacter { c: 'I', index: 0 })\", line: 1, column: 237097)"`, right: `"Unable to parse body message from JSON: Error(\"invalid hex (InvalidHexCharacter { c: \\'I\\', index: 0 })\", line: 1, column: 237097)"`', testing/remote_signer_test/src/consumer.rs:144:5 ``` This is my first contribution and happy to receive feedback if you have any. Thanks commit dffe31c Author: Pawan Dhananjay <[email protected]> Date: Wed Jun 16 09:16:51 2021 +0000 Add an account command to enable/disable validators (sigp#2386) ## Issue Addressed Resolves sigp#2322 ## Proposed Changes Adds a `modify` command to `lighthouse account validator` with subcommands to enable and disable specific or all pubkeys. commit 3b600ac Author: Paul Hauner <[email protected]> Date: Thu Jun 10 01:44:49 2021 +0000 v1.4.0 (sigp#2402) ## Issue Addressed NA ## Proposed Changes - Bump versions and update `Cargo.lock` ## Additional Info NA ## TODO - [x] Ensure sigp#2398 gets merged succesfully commit b383836 Author: Paul Hauner <[email protected]> Date: Wed Jun 9 02:30:06 2021 +0000 Modify Malloc Tuning (sigp#2398) ## Issue Addressed NA ## Proposed Changes I've noticed some of the SigP Prater nodes struggling on v1.4.0-rc.0. I suspect this is due to the changes in sigp#2296. Specifically, the trade-off which lowered the memory footprint whilst increasing runtime on some functions. Presently, this PR is documenting my testing on Prater. ## Additional Info NA commit 4a6f2fa Author: Paul Hauner <[email protected]> Date: Mon Jun 7 02:34:10 2021 +0000 Only perform malloc tuning for beacon node (sigp#2397) ## Issue Addressed NA ## Proposed Changes Only run `configure_memory_alllocator` for the BN process. I noticed that VC memory usage increases significantly with the new malloc tuning parameters. This was also raised by a user on [r/ethstaker](https://www.reddit.com/r/ethstaker/comments/nr8998/lighthouse_prerelease_v140rc0/h0fnt9l?utm_source=share&utm_medium=web2x&context=3). There wasn't any issue with memory usage by the VC before we implemented sigp#2296, so I think we were a bit overzealous when we allowed these changes to affect it. This PR allows things that weren't broken to remain unfixed. ## Additional Info NA commit 93100f2 Author: Paul Hauner <[email protected]> Date: Mon Jun 7 02:34:09 2021 +0000 Make less logs for attn with unknown head (sigp#2395) ## Issue Addressed NA ## Proposed Changes I am starting to see a lot of slog-async overflows (i.e., too many logs) on Prater whenever we see attestations for an unknown block. Since these logs are identical (except for peer id) and we expose volume/count of these errors via `metrics::GOSSIP_ATTESTATION_ERRORS_PER_TYPE`, I took the following actions to remove them from `DEBUG` logs: - Push the "Attestation for unknown block" log to trace. - Add a debug log in `search_for_block`. In effect, this should serve as a de-duped version of the previous, downgraded log. ## Additional Info TBC commit 502402c Author: Pawan Dhananjay <[email protected]> Date: Fri Jun 4 00:10:59 2021 +0000 Fix options for `--eth1-endpoints` flag (sigp#2392) ## Issue Addressed N/A ## Proposed Changes Set `config.sync_eth1_chain` to true when using just the `--eth1-endpoints` flag (without `--eth1`). commit f6280aa Author: Paul Hauner <[email protected]> Date: Thu Jun 3 00:13:02 2021 +0000 v1.4.0-rc.0 (sigp#2379) ## Issue Addressed NA ## Proposed Changes Bump versions. ## Additional Info This is not exactly the v1.4.0 release described in [Lighthouse Update sigp#36](https://lighthouse.sigmaprime.io/update-36.html). Whilst it contains: - Beta Windows support - A reduction in Eth1 queries - A reduction in memory footprint It does not contain: - Altair - Doppelganger Protection - The remote signer We have decided to release some features early. This is primarily due to the desire to allow users to benefit from the memory saving improvements as soon as possible. ## TODO - [x] Wait for sigp#2340, sigp#2356 and sigp#2376 to merge and then rebase on `unstable`. - [x] Ensure discovery issues are fixed (see sigp#2388) - [x] Ensure sigp#2382 is merged/removed. - [x] Ensure sigp#2383 is merged/removed. - [x] Ensure sigp#2384 is merged/removed. - [ ] Double-check eth1 cache is carried between boots commit 90ea075 Author: Paul Hauner <[email protected]> Date: Wed Jun 2 01:07:28 2021 +0000 Revert "Network protocol upgrades (sigp#2345)" (sigp#2388) ## Issue Addressed NA ## Proposed Changes Reverts sigp#2345 in the interests of getting v1.4.0 out this week. Once we have released that, we can go back to testing this again. ## Additional Info NA commit d34f922 Author: Paul Hauner <[email protected]> Date: Wed Jun 2 01:07:27 2021 +0000 Add early check for RPC block relevancy (sigp#2289) ## Issue Addressed NA ## Proposed Changes When observing `jemallocator` heap profiles and Grafana, it became clear that Lighthouse is spending significant RAM/CPU on processing blocks from the RPC. On investigation, it seems that we are loading the parent of the block *before* we check to see if the block is already known. This is a big waste of resources. This PR adds an additional `check_block_relevancy` call as the first thing we do when we try to process a `SignedBeaconBlock` via the RPC (or other similar methods). Ultimately, `check_block_relevancy` will be called again later in the block processing flow. It's a very light function and I don't think trying to optimize it out is worth the risk of a bad block slipping through. Also adds a `New RPC block received` info log when we process a new RPC block. This seems like interesting and infrequent info. ## Additional Info NA commit bf4e02e Author: Paul Hauner <[email protected]> Date: Tue Jun 1 06:59:43 2021 +0000 Return a specific error for frozen attn states (sigp#2384) ## Issue Addressed NA ## Proposed Changes Return a very specific error when at attestation reads shuffling from a frozen `BeaconState`. Previously, this was returning `MissingBeaconState` which indicates a much more serious issue. ## Additional Info Since `get_inconsistent_state_for_attestation_verification_only` is only called once in `BeaconChain::with_committee_cache`, it is quite easy to reason about the impact of this change. commit ba9c4c5 Author: Paul Hauner <[email protected]> Date: Tue Jun 1 06:59:41 2021 +0000 Return more detail in Eth1 HTTP errors (sigp#2383) ## Issue Addressed NA ## Proposed Changes Whilst investigating sigp#2372, I [learned](sigp#2372 (comment)) that the error message returned from some failed Eth1 requests are always `NotReachable`. This makes debugging quite painful. This PR adds more detail to these errors. For example: - Bad infura key: `ERRO Failed to update eth1 cache error: Failed to update Eth1 service: "All fallback errored: https://mainnet.infura.io/ => EndpointError(RequestFailed(\"Response HTTP status was not 200 OK: 401 Unauthorized.\"))", retry_millis: 60000, service: eth1_rpc` - Unreachable server: `ERRO Failed to update eth1 cache error: Failed to update Eth1 service: "All fallback errored: http://127.0.0.1:8545/ => EndpointError(RequestFailed(\"Request failed: reqwest::Error { kind: Request, url: Url { scheme: \\\"http\\\", cannot_be_a_base: false, username: \\\"\\\", password: None, host: Some(Ipv4(127.0.0.1)), port: Some(8545), path: \\\"/\\\", query: None, fragment: None }, source: hyper::Error(Connect, ConnectError(\\\"tcp connect error\\\", Os { code: 111, kind: ConnectionRefused, message: \\\"Connection refused\\\" })) }\"))", retry_millis: 60000, service: eth1_rpc` - Bad server: `ERRO Failed to update eth1 cache error: Failed to update Eth1 service: "All fallback errored: http://127.0.0.1:8545/ => EndpointError(RequestFailed(\"Response HTTP status was not 200 OK: 501 Not Implemented.\"))", retry_millis: 60000, service: eth1_rpc` ## Additional Info NA commit 4c7bb49 Author: Paul Hauner <[email protected]> Date: Mon May 31 04:18:20 2021 +0000 Use the forwards iterator more often (sigp#2376) ## Issue Addressed NA ## Primary Change When investigating memory usage, I noticed that retrieving a block from an early slot (e.g., slot 900) would cause a sharp increase in the memory footprint (from 400mb to 800mb+) which seemed to be ever-lasting. After some investigation, I found that the reverse iteration from the head back to that slot was the likely culprit. To counter this, I've switched the `BeaconChain::block_root_at_slot` to use the forwards iterator, instead of the reverse one. I also noticed that the networking stack is using `BeaconChain::root_at_slot` to check if a peer is relevant (`check_peer_relevance`). Perhaps the steep, seemingly-random-but-consistent increases in memory usage are caused by the use of this function. Using the forwards iterator with the HTTP API alleviated the sharp increases in memory usage. It also made the response much faster (before it felt like to took 1-2s, now it feels instant). ## Additional Changes In the process I also noticed that we have two functions for getting block roots: - `BeaconChain::block_root_at_slot`: returns `None` for a skip slot. - `BeaconChain::root_at_slot`: returns the previous root for a skip slot. I unified these two functions into `block_root_at_slot` and added the `WhenSlotSkipped` enum. Now, the caller must be explicit about the skip-slot behaviour when requesting a root. Additionally, I replaced `vec![]` with `Vec::with_capacity` in `store::chunked_vector::range_query`. I stumbled across this whilst debugging and made this modification to see what effect it would have (not much). It seems like a decent change to keep around, but I'm not concerned either way. Also, `BeaconChain::get_ancestor_block_root` is unused, so I got rid of it :wastebasket:. ## Additional Info I haven't also done the same for state roots here. Whilst it's possible and a good idea, it's more work since the fwds iterators are presently block-roots-specific. Whilst there's a few places a reverse iteration of state roots could be triggered (e.g., attestation production, HTTP API), they're no where near as common as the `check_peer_relevance` call. As such, I think we should get this PR merged first, then come back for the state root iters. I made an issue here sigp#2377. commit 320a683 Author: Kevin Lu <[email protected]> Date: Mon May 31 04:18:19 2021 +0000 Minimum Outbound-Only Peers Requirement (sigp#2356) ## Issue Addressed sigp#2325 ## Proposed Changes This pull request changes the behavior of the Peer Manager by including a minimum outbound-only peers requirement. The peer manager will continue querying for peers if this outbound-only target number hasn't been met. Additionally, when peers are being removed, an outbound-only peer will not be disconnected if doing so brings us below the minimum. ## Additional Info Unit test for heartbeat function tests that disconnection behavior is correct. Continual querying for peers if outbound-only hasn't been met is not directly tested, but indirectly through unit testing of the helper function that counts the number of outbound-only peers. EDIT: Am concerned about the behavior of ```update_peer_scores```. If we have connected to a peer with a score below the disconnection threshold (-20), then its connection status will remain connected, while its score state will change to disconnected. ```rust let previous_state = info.score_state(); // Update scores info.score_update(); Self::handle_score_transitions( previous_state, peer_id, info, &mut to_ban_peers, &mut to_unban_peers, &mut self.events, &self.log, ); ``` ```previous_state``` will be set to Disconnected, and then because ```handle_score_transitions``` only changes connection status for a peer if the state changed, the peer remains connected. Then in the heartbeat code, because we only disconnect healthy peers if we have too many peers, these peers don't get disconnected. I'm not sure realistically how often this scenario would occur, but it might be better to adjust the logic to account for scenarios where the score state implies a connection status different from the current connection status. Co-authored-by: Kevin Lu <[email protected]> commit 0847986 Author: Mac L <[email protected]> Date: Mon May 31 04:18:18 2021 +0000 Reduce outbound requests to eth1 endpoints (sigp#2340) ## Issue Addressed sigp#2282 ## Proposed Changes Reduce the outbound requests made to eth1 endpoints by caching the results from `eth_chainId` and `net_version`. Further reduce the overall request count by increasing `auto_update_interval_millis` from `7_000` (7 seconds) to `60_000` (1 minute). This will result in a reduction from ~2000 requests per hour to 360 requests per hour (during normal operation). A reduction of 82%. ## Additional Info If an endpoint fails, its state is dropped from the cache and the `eth_chainId` and `net_version` calls will be made for that endpoint again during the regular update cycle (once per minute) until it is back online. Co-authored-by: Paul Hauner <[email protected]> commit ec5cceb Author: Age Manning <[email protected]> Date: Sat May 29 07:25:06 2021 +0000 Correct issue with dialing peers (sigp#2375) The ordering of adding new peers to the peerdb and deciding when to dial them was not considered in a previous update. This adds the condition that if a peer is not in the peer-db then it is an acceptable peer to dial. This makes sigp#2374 obsolete. commit d12e746 Author: Age Manning <[email protected]> Date: Fri May 28 22:02:10 2021 +0000 Network protocol upgrades (sigp#2345) This provides a number of upgrades to gossipsub and discovery. The updates are extensive and this needs thorough testing. commit 456b313 Author: Paul Hauner <[email protected]> Date: Fri May 28 05:59:45 2021 +0000 Tune GNU malloc (sigp#2299) ## Issue Addressed NA ## Proposed Changes Modify the configuration of [GNU malloc](https://www.gnu.org/software/libc/manual/html_node/The-GNU-Allocator.html) to reduce memory footprint. - Set `M_ARENA_MAX` to 4. - This reduces memory fragmentation at the cost of contention between threads. - Set `M_MMAP_THRESHOLD` to 2mb - This means that any allocation >= 2mb is allocated via an anonymous mmap, instead of on the heap/arena. This reduces memory fragmentation since we don't need to keep growing the heap to find big contiguous slabs of free memory. - ~~Run `malloc_trim` every 60 seconds.~~ - ~~This shaves unused memory from the top of the heap, preventing the heap from constantly growing.~~ - Removed, see: sigp#2299 (comment) *Note: this only provides memory savings on the Linux (glibc) platform.* ## Additional Info I'm going to close sigp#2288 in favor of this for the following reasons: - I've managed to get the memory footprint *smaller* here than with jemalloc. - This PR seems to be less of a dramatic change than bringing in the jemalloc dep. - The changes in this PR are strictly runtime changes, so we can create CLI flags which disable them completely. Since this change is wide-reaching and complex, it's nice to have an easy "escape hatch" if there are undesired consequences. ## TODO - [x] Allow configuration via CLI flags - [x] Test on Mac - [x] Test on RasPi. - [x] Determine if GNU malloc is present? - I'm not quite sure how to detect for glibc.. This issue suggests we can't really: rust-lang/rust#33244 - [x] Make a clear argument regarding the affect of this on CPU utilization. - [x] Test with higher `M_ARENA_MAX` values. - [x] Test with longer trim intervals - [x] Add some stats about memory savings - [x] Remove `malloc_trim` calls & code commit fdaeec6 Author: Pawan Dhananjay <[email protected]> Date: Wed May 26 05:58:41 2021 +0000 Monitoring service api (sigp#2251) ## Issue Addressed N/A ## Proposed Changes Adds a client side api for collecting system and process metrics and pushing it to a monitoring service. commit 55aada0 Author: Age Manning <[email protected]> Date: Wed May 26 14:21:44 2021 +1000 More stringent dialing (sigp#2363) * More stringent dialing * Cover cached enr dialing commit 5d9a1bc Author: Michael Sproul <[email protected]> Date: Thu May 20 00:23:08 2021 +0000 Add Windows to Bors config (sigp#2358) We accidentally omitted the new Windows tests (sigp#2333) from the Bors config, meaning that PRs will merge before the tests pass. This PR corrects that. commit ba55e14 Author: ethDreamer <[email protected]> Date: Wed May 19 23:05:16 2021 +0000 Enable Compatibility with Windows (sigp#2333) ## Issue Addressed Windows incompatibility. ## Proposed Changes On windows, lighthouse needs to default to STDIN as tty doesn't exist. Also Windows uses ACLs for file permissions. So to mirror chmod 600, we will remove every entry in a file's ACL and add only a single SID that is an alias for the file owner. Beyond that, there were several changes made to different unit tests because windows has slightly different error messages as well as frustrating nuances around killing a process :/ ## Additional Info Tested on my Windows VM and it appears to work, also compiled & tested on Linux with these changes. Permissions look correct on both platforms now. Just waiting for my validator to activate on Prater so I can test running full validator client on windows. Co-authored-by: ethDreamer <[email protected]> Co-authored-by: Michael Sproul <[email protected]>

* Network upgrades (#2345) * Discovery patch (#2382) * Upgrade libp2p and unstable gossip * Network protocol upgrades * Correct dependencies, reduce incoming bucket limit * Clean up dirty DHT entries before repopulating * Update cargo lock * Update lockfile * Update ENR dep * Update deps to specific versions * Update test dependencies * Update docker rust, and remote signer tests * More remote signer test fixes * Temp commit * Update discovery * Remove cached enrs after dialing * Increase the session capacity, for improved efficiency * Bleeding edge discovery (#2435) * Update discovery banning logic and tokio * Update to latest discovery * Shift to latest discovery * Fmt * Initial re-factor of the behaviour * More progress * Missed changes * First draft * Discovery as a behaviour * Adding back event waker (not convinced its neccessary, but have made this many changes already) * Corrections * Speed up discovery * Remove double log * Fmt * After disconnect inform swarm about ban * More fmt * Appease clippy * Improve ban handling * Update tests * Update cargo.lock * Correct tests * Downgrade log

* Network upgrades (sigp#2345) * Discovery patch (sigp#2382) * Upgrade libp2p and unstable gossip * Network protocol upgrades * Correct dependencies, reduce incoming bucket limit * Clean up dirty DHT entries before repopulating * Update cargo lock * Update lockfile * Update ENR dep * Update deps to specific versions * Update test dependencies * Update docker rust, and remote signer tests * More remote signer test fixes * Temp commit * Update discovery * Remove cached enrs after dialing * Increase the session capacity, for improved efficiency * Bleeding edge discovery (sigp#2435) * Update discovery banning logic and tokio * Update to latest discovery * Shift to latest discovery * Fmt * Initial re-factor of the behaviour * More progress * Missed changes * First draft * Discovery as a behaviour * Adding back event waker (not convinced its neccessary, but have made this many changes already) * Corrections * Speed up discovery * Remove double log * Fmt * After disconnect inform swarm about ban * More fmt * Appease clippy * Improve ban handling * Update tests * Update cargo.lock * Correct tests * Downgrade log

@realbigsean

commit c5786a8 Author: realbigsean <[email protected]> Date: Sat Jul 31 03:50:52 2021 +0000 Doppelganger detection (sigp#2230) ## Issue Addressed Resolves sigp#2069 ## Proposed Changes - Adds a `--doppelganger-detection` flag - Adds a `lighthouse/seen_validators` endpoint, which will make it so the lighthouse VC is not interopable with other client beacon nodes if the `--doppelganger-detection` flag is used, but hopefully this will become standardized. Relevant Eth2 API repo issue: ethereum/beacon-APIs#64 - If the `--doppelganger-detection` flag is used, the VC will wait until the beacon node is synced, and then wait an additional 2 epochs. The reason for this is to make sure the beacon node is able to subscribe to the subnets our validators should be attesting on. I think an alternative would be to have the beacon node subscribe to all subnets for 2+ epochs on startup by default. ## Additional Info I'd like to add tests and would appreciate feedback. TODO: handle validators started via the API, potentially make this default behavior Co-authored-by: realbigsean <[email protected]> Co-authored-by: Michael Sproul <[email protected]> Co-authored-by: Paul Hauner <[email protected]> commit 834ee98 Author: SaNNNNNNNN <[email protected]> Date: Sat Jul 31 02:24:09 2021 +0000 Fix flag in redundancy docs (sigp#2482) Replace all --process-all-attestations with --import-all-attestations ## Issue Addressed Which issue # does this PR address? ## Proposed Changes Please list or describe the changes introduced by this PR. ## Additional Info Please provide any additional information. For example, future considerations or information useful for reviewers. commit 303deb9 Author: realbigsean <[email protected]> Date: Fri Jul 30 01:11:47 2021 +0000 Rust 1.54.0 lints (sigp#2483) ## Issue Addressed N/A ## Proposed Changes - Removing a bunch of unnecessary references - Updated `Error::VariantError` to `Error::Variant` - There were additional enum variant lints that I ignored, because I thought our variant names were fine - removed `MonitoredValidator`'s `pubkey` field, because I couldn't find it used anywhere. It looks like we just use the string version of the pubkey (the `id` field) if there is no index ## Additional Info Co-authored-by: realbigsean <[email protected]> commit 8efd9fc Author: Paul Hauner <[email protected]> Date: Thu Jul 29 04:38:26 2021 +0000 Add `AttesterCache` for attestation production (sigp#2478) ## Issue Addressed - Resolves sigp#2169 ## Proposed Changes Adds the `AttesterCache` to allow validators to produce attestations for older slots. Presently, some arbitrary restrictions can force validators to receive an error when attesting to a slot earlier than the present one. This can cause attestation misses when there is excessive load on the validator client or time sync issues between the VC and BN. ## Additional Info NA commit 1d4f90e Author: Michael Sproul <[email protected]> Date: Thu Jul 29 02:16:54 2021 +0000 Bump tests to v1.1.0-beta.2 (sigp#2481) ## Proposed Changes Bump spec tests to v1.1.0-beta.2, for conformance with the latest spec release: https://github.com/ethereum/eth2.0-specs/releases/tag/v1.1.0-beta.2 ## Additional Info We already happen to be compatible with the latest spec change that requires sync contributions to have at least one bit set. I'm gonna call it foresight on @realbigsean's part 😎 https://github.com/sigp/lighthouse/blob/6e3ca48cb934a63cafdc940068825a48cba740d1/beacon_node/beacon_chain/src/sync_committee_verification.rs#L285-L288 commit 923486f Author: Michael Sproul <[email protected]> Date: Wed Jul 28 05:40:21 2021 +0000 Use bulk verification for sync_aggregate signature (sigp#2415) ## Proposed Changes Add the `sync_aggregate` from `BeaconBlock` to the bulk signature verifier for blocks. This necessitates a new signature set constructor for the sync aggregate, which is different from the others due to the use of [`eth2_fast_aggregate_verify`](https://github.com/ethereum/eth2.0-specs/blob/v1.1.0-alpha.7/specs/altair/bls.md#eth2_fast_aggregate_verify) for sync aggregates, per [`process_sync_aggregate`](https://github.com/ethereum/eth2.0-specs/blob/v1.1.0-alpha.7/specs/altair/beacon-chain.md#sync-aggregate-processing). I made the choice to return an optional signature set, with `None` representing the case where the signature is valid on account of being the point at infinity (requires no further checking). To "dogfood" the changes and prevent duplication, the consensus logic now uses the signature set approach as well whenever it is required to verify signatures (which should only be in testing AFAIK). The EF tests pass with the code as it exists currently, but failed before I adapted the `eth2_fast_aggregate_verify` changes (which is good). As a result of this change Altair block processing should be a little faster, and importantly, we will no longer accidentally verify signatures when replaying blocks, e.g. when replaying blocks from the database. commit 6e3ca48 Author: Paul Hauner <[email protected]> Date: Tue Jul 27 07:01:01 2021 +0000 Cache participating indices for Altair epoch processing (sigp#2416) ## Issue Addressed NA ## Proposed Changes This PR addresses two things: 1. Allows the `ValidatorMonitor` to work with Altair states. 1. Optimizes `altair::process_epoch` (see [code](https://github.com/paulhauner/lighthouse/blob/participation-cache/consensus/state_processing/src/per_epoch_processing/altair/participation_cache.rs) for description) ## Breaking Changes The breaking changes in this PR revolve around one premise: *After the Altair fork, it's not longer possible (given only a `BeaconState`) to identify if a validator had *any* attestation included during some epoch. The best we can do is see if that validator made the "timely" source/target/head flags.* Whilst this seems annoying, it's not actually too bad. Finalization is based upon "timely target" attestations, so that's really the most important thing. Although there's *some* value in knowing if a validator had *any* attestation included, it's far more important to know about "timely target" participation, since this is what affects finality and justification. For simplicity and consistency, I've also removed the ability to determine if *any* attestation was included from metrics and API endpoints. Now, all Altair and non-Altair states will simply report on the head/target attestations. The following section details where we've removed fields and provides replacement values. ### Breaking Changes: Prometheus Metrics Some participation metrics have been removed and replaced. Some were removed since they are no longer relevant to Altair (e.g., total attesting balance) and others replaced with gwei values instead of pre-computed values. This provides more flexibility at display-time (e.g., Grafana). The following metrics were added as replacements: - `beacon_participation_prev_epoch_head_attesting_gwei_total` - `beacon_participation_prev_epoch_target_attesting_gwei_total` - `beacon_participation_prev_epoch_source_attesting_gwei_total` - `beacon_participation_prev_epoch_active_gwei_total` The following metrics were removed: - `beacon_participation_prev_epoch_attester` - instead use `beacon_participation_prev_epoch_source_attesting_gwei_total / beacon_participation_prev_epoch_active_gwei_total`. - `beacon_participation_prev_epoch_target_attester` - instead use `beacon_participation_prev_epoch_target_attesting_gwei_total / beacon_participation_prev_epoch_active_gwei_total`. - `beacon_participation_prev_epoch_head_attester` - instead use `beacon_participation_prev_epoch_head_attesting_gwei_total / beacon_participation_prev_epoch_active_gwei_total`. The `beacon_participation_prev_epoch_attester` endpoint has been removed. Users should instead use the pre-existing `beacon_participation_prev_epoch_target_attester`. ### Breaking Changes: HTTP API The `/lighthouse/validator_inclusion/{epoch}/{validator_id}` endpoint loses the following fields: - `current_epoch_attesting_gwei` (use `current_epoch_target_attesting_gwei` instead) - `previous_epoch_attesting_gwei` (use `previous_epoch_target_attesting_gwei` instead) The `/lighthouse/validator_inclusion/{epoch}/{validator_id}` endpoint lose the following fields: - `is_current_epoch_attester` (use `is_current_epoch_target_attester` instead) - `is_previous_epoch_attester` (use `is_previous_epoch_target_attester` instead) - `is_active_in_current_epoch` becomes `is_active_unslashed_in_current_epoch`. - `is_active_in_previous_epoch` becomes `is_active_unslashed_in_previous_epoch`. ## Additional Info NA ## TODO - [x] Deal with total balances - [x] Update validator_inclusion API - [ ] Ensure `beacon_participation_prev_epoch_target_attester` and `beacon_participation_prev_epoch_head_attester` work before Altair Co-authored-by: realbigsean <[email protected]> commit f5bdca0 Author: Michael Sproul <[email protected]> Date: Tue Jul 27 05:43:35 2021 +0000 Update to spec v1.1.0-beta.1 (sigp#2460) ## Proposed Changes Update to the latest version of the Altair spec, which includes new tests and a tweak to the target sync aggregators. ## Additional Info This change is _not_ required for the imminent Altair devnet, and is waiting on the merge of sigp#2321 to unstable. Co-authored-by: Paul Hauner <[email protected]> commit 84e6d71 Author: Michael Sproul <[email protected]> Date: Fri Jul 23 00:23:53 2021 +0000 Tree hash caching and optimisations for Altair (sigp#2459) ## Proposed Changes Remove the remaining Altair `FIXME`s from consensus land. 1. Implement tree hash caching for the participation lists. This required some light type manipulation, including removing the `TreeHash` bound from `CachedTreeHash` which was purely descriptive. 2. Plumb the proposer index through Altair attestation processing, to avoid calculating it for _every_ attestation (potentially 128ms on large networks). This duplicates some work from sigp#2431, but with the aim of getting it in sooner, particularly for the Altair devnets. 3. Removes two FIXMEs related to `superstruct` and cloning, which are unlikely to be particularly detrimental and will be tracked here instead: sigp/superstruct#5 commit 74aa99c Author: Michael Sproul <[email protected]> Date: Thu Jul 22 01:37:01 2021 +0000 Document BN API security considerations (sigp#2470) ## Issue Addressed Closes sigp#2468 ## Proposed Changes Document security considerations for the beacon node API, with strong recommendations against exposing it to the internet. commit 63923ea Author: Michael Sproul <[email protected]> Date: Wed Jul 21 07:10:52 2021 +0000 Bump discv5 to v0.1.0-beta.8 (sigp#2471) ## Proposed Changes Update discv5 to fix bugs seen on `altair-devnet-1` commit 17b6d7c Author: Mac L <[email protected]> Date: Wed Jul 21 07:10:51 2021 +0000 Add `http-address` flag to VC (sigp#2467) ## Issue Addressed sigp#2454 ## Proposed Changes Adds the `--http-address` flag to allow the user to use custom HTTP addresses. This can be helpful for certain Docker setups. Since using custom HTTP addresses is unsafe due to the server being unencrypted, `--unencrypted-http-transport` was also added as a safety flag and must be used in tandem with `--http-address`. This is to ensure the user is aware of the risks associated with using non-local HTTP addresses. commit bcf8ba6 Author: realbigsean <[email protected]> Date: Wed Jul 21 03:24:23 2021 +0000 Add lcli Dockerfile and auto-build to CI (sigp#2469) ## Issue Addressed Resolves: sigp#2087 ## Proposed Changes - Add a `Dockerfile` to the `lcli` directory - Add a github actions job to build and push and `lcli` docker image on pushes to `unstable` and `stable` ## Additional Info It's a little awkward but `lcli` requires the full project scope so must be built: - from the `lighthouse` dir with: `docker build -f ./lcli/Dockerflie .` - from the `lcli` dir with: `docker build -f ./Dockerfile ../` Didn't include `libssl-dev` or `ca-certificates`, `lcli` doesn't need these right? Co-authored-by: realbigsean <[email protected]> Co-authored-by: Michael Sproul <[email protected]> Co-authored-by: Michael Sproul <[email protected]> commit 9a8320b Merge: b0f5c4c 08fedbf Author: Age Manning <[email protected]> Date: Thu Jul 15 18:15:07 2021 +1000 Merge pull request sigp#2389 from sigp/network-1.5 Network Updates for 1.5 commit 08fedbf Author: Age Manning <[email protected]> Date: Thu Jul 15 10:53:59 2021 +1000 Libp2p Connection Limit (sigp#2455) * Get libp2p to handle connection limits * fmt commit 6818a94 Author: Age Manning <[email protected]> Date: Wed Jul 14 16:54:44 2021 +1000 Discovery update (sigp#2458) commit 381befb Author: Age Manning <[email protected]> Date: Wed Jul 14 12:59:24 2021 +1000 Ensure disconnecting peers are added to the peerdb (sigp#2451) commit 059d9ec Author: Age Manning <[email protected]> Date: Tue Jul 13 15:37:52 2021 +1000 Gossipsub scoring improvements (sigp#2391) * Tweak gossipsub parameters for improved scoring * Modify gossip history * Update settings * Make mesh window constant * Decrease the mesh message deliveries weight * Fmt commit c62810b Author: Age Manning <[email protected]> Date: Tue Jul 13 14:37:25 2021 +1000 Update to Libp2p to 39.1 (sigp#2448) * Adjust beacon node timeouts for validator client HTTP requests (sigp#2352) Resolves sigp#2313 Provide `BeaconNodeHttpClient` with a dedicated `Timeouts` struct. This will allow granular adjustment of the timeout duration for different calls made from the VC to the BN. These can either be a constant value, or as a ratio of the slot duration. Improve timeout performance by using these adjusted timeout duration's only whenever a fallback endpoint is available. Add a CLI flag called `use-long-timeouts` to revert to the old behavior. Additionally set the default `BeaconNodeHttpClient` timeouts to the be the slot duration of the network, rather than a constant 12 seconds. This will allow it to adjust to different network specifications. Co-authored-by: Paul Hauner <[email protected]> * Use read_recursive locks in database (sigp#2417) Closes sigp#2245 Replace all calls to `RwLock::read` in the `store` crate with `RwLock::read_recursive`. * Unfortunately we can't run the deadlock detector on CI because it's pinned to an old Rust 1.51.0 nightly which cannot compile Lighthouse (one of our deps uses `ptr::addr_of!` which is too new). A fun side-project at some point might be to update the deadlock detector. * The reason I think we haven't seen this deadlock (at all?) in practice is that _writes_ to the database's split point are quite infrequent, and a concurrent write is required to trigger the deadlock. The split point is only written when finalization advances, which is once per epoch (every ~6 minutes), and state reads are also quite sporadic. Perhaps we've just been incredibly lucky, or there's something about the timing of state reads vs database migration that protects us. * I wrote a few small programs to demo the deadlock, and the effectiveness of the `read_recursive` fix: https://github.com/michaelsproul/relock_deadlock_mvp * [The docs for `read_recursive`](https://docs.rs/lock_api/0.4.2/lock_api/struct.RwLock.html#method.read_recursive) warn of starvation for writers. I think in order for starvation to occur the database would have to be spammed with so many state reads that it's unable to ever clear them all and find time for a write, in which case migration of states to the freezer would cease. If an attack could be performed to trigger this starvation then it would likely trigger a deadlock in the current code, and I think ceasing migration is preferable to deadlocking in this extreme situation. In practice neither should occur due to protection from spammy peers at the network layer. Nevertheless, it would be prudent to run this change on the testnet nodes to check that it doesn't cause accidental starvation. * Return more detail when invalid data is found in the DB during startup (sigp#2445) - Resolves sigp#2444 Adds some more detail to the error message returned when the `BeaconChainBuilder` is unable to access or decode block/state objects during startup. NA * Use hardware acceleration for SHA256 (sigp#2426) Modify the SHA256 implementation in `eth2_hashing` so that it switches between `ring` and `sha2` to take advantage of [x86_64 SHA extensions](https://en.wikipedia.org/wiki/Intel_SHA_extensions). The extensions are available on modern Intel and AMD CPUs, and seem to provide a considerable speed-up: on my Ryzen 5950X it dropped state tree hashing times by about 30% from 35ms to 25ms (on Prater). The extensions became available in the `sha2` crate [last year](https://www.reddit.com/r/rust/comments/hf2vcx/ann_rustcryptos_sha1_and_sha2_now_support/), and are not available in Ring, which uses a [pure Rust implementation of sha2](https://github.com/briansmith/ring/blob/main/src/digest/sha2.rs). Ring is faster on CPUs that lack the extensions so I've implemented a runtime switch to use `sha2` only when the extensions are available. The runtime switching seems to impose a miniscule penalty (see the benchmarks linked below). * Start a release checklist (sigp#2270) NA Add a checklist to the release draft created by CI. I know @michaelsproul was also working on this and I suspect @realbigsean also might have useful input. NA * Serious banning * fmt Co-authored-by: Mac L <[email protected]> Co-authored-by: Paul Hauner <[email protected]> Co-authored-by: Michael Sproul <[email protected]> commit 3c0d322 Author: Age Manning <[email protected]> Date: Tue Jul 13 10:48:33 2021 +1000 Global Network Behaviour Refactor (sigp#2442) * Network upgrades (sigp#2345) * Discovery patch (sigp#2382) * Upgrade libp2p and unstable gossip * Network protocol upgrades * Correct dependencies, reduce incoming bucket limit * Clean up dirty DHT entries before repopulating * Update cargo lock * Update lockfile * Update ENR dep * Update deps to specific versions * Update test dependencies * Update docker rust, and remote signer tests * More remote signer test fixes * Temp commit * Update discovery * Remove cached enrs after dialing * Increase the session capacity, for improved efficiency * Bleeding edge discovery (sigp#2435) * Update discovery banning logic and tokio * Update to latest discovery * Shift to latest discovery * Fmt * Initial re-factor of the behaviour * More progress * Missed changes * First draft * Discovery as a behaviour * Adding back event waker (not convinced its neccessary, but have made this many changes already) * Corrections * Speed up discovery * Remove double log * Fmt * After disconnect inform swarm about ban * More fmt * Appease clippy * Improve ban handling * Update tests * Update cargo.lock * Correct tests * Downgrade log commit 6422632 Author: Pawan Dhananjay <[email protected]> Date: Fri Jul 9 08:18:29 2021 +0530 Relax requirement for enr fork digest predicate (sigp#2433) commit c1d2e35 Author: Age Manning <[email protected]> Date: Wed Jul 7 18:18:44 2021 +1000 Bleeding edge discovery (sigp#2435) * Update discovery banning logic and tokio * Update to latest discovery * Shift to latest discovery * Fmt commit f4bc9db Author: Age Manning <[email protected]> Date: Tue Jun 15 14:53:35 2021 +1000 Change the window mode of yamux (sigp#2390) commit 6fb48b4 Author: Age Manning <[email protected]> Date: Tue Jun 15 14:40:43 2021 +1000 Discovery patch (sigp#2382) * Upgrade libp2p and unstable gossip * Network protocol upgrades * Correct dependencies, reduce incoming bucket limit * Clean up dirty DHT entries before repopulating * Update cargo lock * Update lockfile * Update ENR dep * Update deps to specific versions * Update test dependencies * Update docker rust, and remote signer tests * More remote signer test fixes * Temp commit * Update discovery * Remove cached enrs after dialing * Increase the session capacity, for improved efficiency commit 4aa06c9 Author: Age Manning <[email protected]> Date: Thu Jun 3 11:11:33 2021 +1000 Network upgrades (sigp#2345) commit b0f5c4c Author: Paul Hauner <[email protected]> Date: Thu Jul 15 04:22:06 2021 +0000 Clarify eth1 error message (sigp#2461) ## Issue Addressed - Closes sigp#2452 ## Proposed Changes Addresses: sigp#2452 (comment) ## Additional Info NA commit a3a7f39 Author: realbigsean <[email protected]> Date: Thu Jul 15 00:52:02 2021 +0000 [Altair] Sync committee pools (sigp#2321) Add pools supporting sync committees: - naive sync aggregation pool - observed sync contributions pool - observed sync contributors pool - observed sync aggregators pool Add SSZ types and tests related to sync committee signatures. Co-authored-by: Michael Sproul <[email protected]> Co-authored-by: realbigsean <[email protected]> commit 8fa6e46 Author: Michael Sproul <[email protected]> Date: Wed Jul 14 05:24:10 2021 +0000 Update direct libsecp256k1 dependencies (sigp#2456) ## Proposed Changes * Remove direct dependencies on vulnerable `libsecp256k1 0.3.5` * Ignore the RUSTSEC issue until it is resolved in sigp#2389 commit fc4c611 Author: Paul Hauner <[email protected]> Date: Wed Jul 14 05:24:09 2021 +0000 Remove msg about longer sync with remote eth1 nodes (sigp#2453) ## Issue Addressed - Resolves sigp#2452 ## Proposed Changes I've seen a few people confused by this and I don't think the message is really worth it. ## Additional Info NA

AgeManning added 5 commits May 7, 2021 12:13

Upgrade libp2p and unstable gossip

770b6f6

Merge latest unstable

bb04252

Network protocol upgrades

f76c3a4

Correct dependencies, reduce incoming bucket limit

55f203f

Clean up dirty DHT entries before repopulating

8f02d21

AgeManning added the do-not-merge label May 17, 2021

AgeManning added 6 commits May 17, 2021 15:17

Update cargo lock

f9ff95c

Update lockfile

d676b19

Update ENR dep

4282f80

Update deps to specific versions

6ef4a80

Merge latest unstable

1463347

Update test dependencies

e5d8605

AgeManning added ready-for-review The code is ready for review and removed do-not-merge labels May 28, 2021

AgeManning added 2 commits May 28, 2021 16:34

Update docker rust, and remote signer tests

d987450

More remote signer test fixes

6ba3dcc

paulhauner approved these changes May 28, 2021

View reviewed changes

paulhauner added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels May 28, 2021

bors bot pushed a commit that referenced this pull request May 28, 2021

Network protocol upgrades (#2345)

d12e746

This provides a number of upgrades to gossipsub and discovery. The updates are extensive and this needs thorough testing.

bors bot changed the title ~~Network protocol upgrades~~ [Merged by Bors] - Network protocol upgrades May 28, 2021

bors bot closed this May 28, 2021

paulhauner deleted the network-upgrades branch May 31, 2021 04:06

paulhauner added a commit to paulhauner/lighthouse that referenced this pull request Jun 2, 2021

Revert "Network protocol upgrades (sigp#2345)"

fffb181

This reverts commit d12e746.

paulhauner mentioned this pull request Jun 2, 2021

[Merged by Bors] - Revert "Network protocol upgrades (#2345)" #2388

Closed

paulhauner restored the network-upgrades branch June 2, 2021 00:54

AgeManning added a commit that referenced this pull request Jun 3, 2021

Network upgrades #2345

0a3b3f2

AgeManning added a commit that referenced this pull request Jun 3, 2021

Network upgrades #2345

484d93a

ethDreamer pushed a commit to ethDreamer/lighthouse that referenced this pull request Jun 18, 2021

Network upgrades sigp#2345

b53a56c

AgeManning added a commit that referenced this pull request Jul 7, 2021

Network upgrades #2345

26a12dd

AgeManning added a commit that referenced this pull request Jul 8, 2021

Network upgrades #2345

a77a3b8

AgeManning added a commit that referenced this pull request Jul 9, 2021

Network upgrades #2345

d8b6108

AgeManning added a commit that referenced this pull request Jul 9, 2021

Network upgrades (#2345)

6d66788

AgeManning added a commit that referenced this pull request Jul 13, 2021

Network upgrades (#2345)

5b2195b

AgeManning deleted the network-upgrades branch July 15, 2021 03:06

AgeManning added a commit that referenced this pull request Jul 15, 2021

Network upgrades (#2345)

056c926

AgeManning added a commit that referenced this pull request Jul 15, 2021

Network upgrades (#2345)

4aa06c9

paulhauner pushed a commit to paulhauner/lighthouse that referenced this pull request Jul 22, 2021

Network upgrades (sigp#2345)

264d5d8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Merged by Bors] - Network protocol upgrades #2345

[Merged by Bors] - Network protocol upgrades #2345

AgeManning commented May 17, 2021

AgeManning commented May 28, 2021

paulhauner left a comment

bors bot commented May 28, 2021

[Merged by Bors] - Network protocol upgrades #2345

[Merged by Bors] - Network protocol upgrades #2345

Conversation

AgeManning commented May 17, 2021

AgeManning commented May 28, 2021

paulhauner left a comment

Choose a reason for hiding this comment

bors bot commented May 28, 2021