Lighthouse sends genesis block twice in libp2p BlocksByRange response #4943

etan-status · 2023-11-21T19:59:55Z

Description

When requesting blocks 0 ..< 32 on Holesky from Nimbus, Lighthouse seems to respond with a duplicate Genesis block.

Version

libp2p identify message: agent_version=Lighthouse/v4.5.0-441fc16/x86_64-linux

Present Behaviour

Lighthouse responds with slots [0, 0, 2, 4, 6, 9, 11, 13, 14, 15, 16, 18, 20, 21, 22, 23, 24, 26, 27, 28, 30]

Expected Behaviour

Lighthouse only includes slot 0 once in the response.

Steps to resolve

https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/p2p-interface.md#beaconblocksbyrange

Clients MUST respond with blocks that are consistent from a single chain within the context of the request. This applies to any step value. In particular when step == 1, each parent_root MUST match the hash_tree_root of the preceding block.

The text was updated successfully, but these errors were encountered:

jimmygchen · 2023-11-22T04:35:35Z

@etan-status Thanks for raising this! I'll look into this.

michaelsproul · 2023-11-29T06:34:41Z

I think I know what's happening here.

We had a database bug (#4817) which caused the block root at slot 1 not to be stored on Holesky in the case where the node checkpoint synced and backfilled. We fixed that bug in #4820 in order to fix state reconstruction, however we didn't realise at the time that it would manifest as incorrect BlocksByRange responses. The reason for the incorrect responses is that we store block roots in chunks of 128 at a time. These chunks are default initialised to 0x0, so the block root for slot 1 on buggy Holesky nodes is 0x0. Our database also knows how to resolve the 0x0 root to the genesis block, so in a BlocksByRange we:

slot 0: look up the actual genesis block root and get the genesis block
slot 1: look up the 0x0 genesis block alias and get the genesis block

The 0x0 isn't caught by the de-duplication that we apply because it's not equal to the actual genesis block root.

I think an appropriate fix would be to run a little function at startup on v18 Holesky databases to store the correct block root at slot 1. We didn't implement this initially because we thought the corruption was only relevant to archive nodes, and they were failing loudly.

TL;DR: some disgusting database junk that only happened on Holesky, requires a patch to fix

michaelsproul · 2023-12-07T04:12:56Z

Fixed in #4985. This will be in Lighthouse v4.6.0. We'll encourage users to update so it fixes the network-wide behaviour.

jimmygchen self-assigned this Nov 22, 2023

jimmygchen added the bug Something isn't working label Nov 22, 2023

michaelsproul added the Networking label Nov 25, 2023

michaelsproul added database and removed Networking labels Nov 29, 2023

jimmygchen mentioned this issue Dec 6, 2023

Fix corrupted DB on networks where the first slot is skipped (Holesky) #4985

Merged

michaelsproul closed this as completed Dec 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lighthouse sends genesis block twice in libp2p BlocksByRange response #4943

Lighthouse sends genesis block twice in libp2p BlocksByRange response #4943

etan-status commented Nov 21, 2023

jimmygchen commented Nov 22, 2023

michaelsproul commented Nov 29, 2023

michaelsproul commented Dec 7, 2023

Lighthouse sends genesis block twice in libp2p BlocksByRange response #4943

Lighthouse sends genesis block twice in libp2p BlocksByRange response #4943

Comments

etan-status commented Nov 21, 2023

Description

Version

Present Behaviour

Expected Behaviour

Steps to resolve

jimmygchen commented Nov 22, 2023

michaelsproul commented Nov 29, 2023

michaelsproul commented Dec 7, 2023