Time discrepancy false positive with Teku #3421

torfbolt · 2022-08-03T18:35:46Z

Description

During the doppelganger protection phase I received multiple error messages from the VC warning about time discrepancy, even though both VC and BN run on the same machine with correct NTP time.

Version

2.5.1 running on Prater

Present Behaviour

After upgrading to 2.5.1 and restarting, the Lighthouse VC went into the doppelganger listening phase. It successfully connected to both configured beacon nodes, a Teku 22.8.0 and a Lighthouse 2.5.1 running on the same machine, fully synced. During the doppelganger protection phase I then received multiple times the log message

Aug 03 18:01:47 schnecki lighthouse[831307]: Aug 03 18:01:47.087 INFO Connected to beacon node endpoint: http://localhost:33050/, version: teku/v22.8.0/linux-x86_64/-privatebuild-ope>
Aug 03 18:01:47 schnecki lighthouse[831307]: Aug 03 18:01:47.208 ERRO Time discrepancy with beacon node endpoint: http://localhost:33050/, local_slot: 3586808, beacon_node_slot: 3586804, msg:>
Aug 03 18:01:48 schnecki lighthouse[831307]: Aug 03 18:01:48.207 INFO Connected to beacon node endpoint: http://localhost:33050/, version: teku/v22.8.0/linux-x86_64/-privatebuild-ope>
Aug 03 18:01:48 schnecki lighthouse[831307]: Aug 03 18:01:48.208 ERRO Time discrepancy with beacon node endpoint: http://localhost:33050/, local_slot: 3586809, beacon_node_slot: 3586804, msg:>
Aug 03 18:01:54 schnecki lighthouse[831307]: Aug 03 18:01:54.002 INFO Connected to beacon node(s) synced: 2, available: 2, total: 2, service: notifier

There were no error messages about the Lighthouse BN, but that might be only because it was listed as second prio BN.

After the doppelganger protection phase ended, there were no further time discrepancy errors.

Expected Behaviour

No error messages about time discrepancy also during the doppelganger protection phase.

Steps to resolve

michaelsproul · 2022-08-04T07:17:49Z

Have you inquired with Teku about this? It looks like it might be a discrepancy in how we define the sync_distance in the /eth/v1/node/syncing endpoint. Lighthouse tries to use the sync_distance to work out the remote beacon node's current slot here:

lighthouse/validator_client/src/check_synced.rs

Lines 58 to 71 in df51a73

    
               if let Some(local_slot) = slot_clock.now() { 
        
                   let remote_slot = resp.data.head_slot + resp.data.sync_distance; 
        
                   if remote_slot + 1 < local_slot || local_slot + 1 < remote_slot { 
        
                       error!( 
        
                           log, 
        
                           "Time discrepancy with beacon node"; 
        
                           "msg" => "check the system time on this host and the beacon node", 
        
                           "beacon_node_slot" => remote_slot, 
        
                           "local_slot" => local_slot, 
        
                           "endpoint" => %beacon_node, 
        
                       ); 
        
                   } 
        
               } 
        
           }

torfbolt · 2022-08-04T20:33:53Z

I think I found out when this happens: if there are two or more consecutive empty slots on the beacon chain. So it looks like Teku doesn't update the head slot (or sync distance) when there are empty slots, while lighthouse expects it to go up.

Teku logs on Prater:

Aug 04 20:16:52 schnecki teku[831696]: 20:16:52.272 INFO - Slot Event *** Slot: 3594684, Block: 57c62b660692653c38f26933642565cdf3b2d2173e6af12993af1431b723a9db, Justified: 112332, Finalized: 112>
Aug 04 20:17:04 schnecki teku[831696]: 20:17:04.346 INFO - Slot Event *** Slot: 3594685, Block: ... empty, Justified: 112332, Finalized: 112>
Aug 04 20:17:16 schnecki teku[831696]: 20:17:16.267 INFO - Slot Event *** Slot: 3594686, Block: ... empty, Justified: 112332, Finalized: 112>
Aug 04 20:17:28 schnecki teku[831696]: 20:17:28.276 INFO - Slot Event *** Slot: 3594687, Block: ... empty, Justified: 112332, Finalized: 112>
Aug 04 20:17:39 schnecki teku[831696]: 20:17:39.898 INFO - Epoch Event *** Epoch: 112334, Justified checkpoint: 112333, Finalized checkpoint: 112332, Finalized root: d543a170bf7255ea9a74f719f692ea>

Lighthouse VC logs:

Aug 04 20:17:23 schnecki lighthouse[901917]: Aug 04 20:17:23.001 INFO Connected to beacon node endpoint: http://localhost:33050/, version: teku/v22.8.0/linux-x86_64/-privatebuild-ope>
Aug 04 20:17:23 schnecki lighthouse[901917]: Aug 04 20:17:23.001 ERRO Time discrepancy with beacon node endpoint: http://localhost:33050/, local_slot: 3594686, beacon_node_slot: 3594684, msg:>
Aug 04 20:17:30 schnecki lighthouse[901917]: Aug 04 20:17:30.000 INFO Connected to beacon node(s) synced: 2, available: 2, total: 2, service: notifier
Aug 04 20:17:30 schnecki lighthouse[901917]: Aug 04 20:17:30.000 INFO Listening for doppelgangers doppelganger_detecting_validators: 6, service: notifier
Aug 04 20:17:30 schnecki lighthouse[901917]: Aug 04 20:17:30.000 INFO Awaiting activation slot: 3594687, epoch: 112333, validators: 6, service: notifier
Aug 04 20:17:35 schnecki lighthouse[901917]: Aug 04 20:17:35.001 INFO Connected to beacon node endpoint: http://localhost:33050/, version: teku/v22.8.0/linux-x86_64/-privatebuild-ope>
Aug 04 20:17:35 schnecki lighthouse[901917]: Aug 04 20:17:35.001 ERRO Time discrepancy with beacon node endpoint: http://localhost:33050/, local_slot: 3594687, beacon_node_slot: 3594684, msg:>
Aug 04 20:17:42 schnecki lighthouse[901917]: Aug 04 20:17:42.000 INFO Connected to beacon node(s) synced: 2, available: 2, total: 2, service: notifier
Aug 04 20:17:42 schnecki lighthouse[901917]: Aug 04 20:17:42.000 INFO Listening for doppelgangers doppelganger_detecting_validators: 6, service: notifier

What I haven't found out yet is why this error only appears during the doppelganger protection phase.

michaelsproul · 2022-08-15T01:36:49Z

What I haven't found out yet is why this error only appears during the doppelganger protection phase.

I've seen several reports now of users seeing this even when they're not using doppelganger protection. I think your diagnosis about the head slot behaviour must be correct.

nflaig · 2023-10-16T11:36:48Z

False positives can also be observed with Lodestar ChainSafe/lodestar#5555. As outlined already, there seems to be a different interpretation of what value should be used for sync_distance if the node is synced.

Lighthouse always does the calculation based on current / clock slot

lighthouse/beacon_node/http_api/src/lib.rs

Line 2737 in c3321dd

let sync_distance = current_slot - head_slot;

Whereas Lodestar does not consider the current / clock slot at all when node is synced (only when syncing), hence it can't be used to detect a time discrepancy. The sync_distance is always set to 0 and the head slot is set to the head of fork choice.

This breaks the assumptions Lighthouse has when determining a time discrepancy as missed blocks would set the head behind while sync distance is still 0 as mentioned in #3421 (comment). For Lodestar, the time discrepancy error can already be observed with a single missed block as by the time Lighthouse calls syncing status API the head might not yet be updated resulting in a two (2) slot difference.

According to the beacon spec, the Lodestar implementation seems correct

sync_distance:
    allOf:
     - description: "How many slots node needs to process to reach head. 0 if synced."
     - $ref: "../../beacon-node-oapi.yaml#/components/schemas/Uint64"

But I am not sure how accurate those property descriptions are, for head_slot it says "Head slot node is trying to reach" which sounds more like current / clock slot and not the head slot based on fork choice head. I think it would be a good thing if we could use the syncing status API to detect clock drifts on either the validator client or beacon node.

Clock drifts have been a relatively frequent issue, especially for solo stakers and I would like to see a solution for this on Lodestar to warn users about it ChainSafe/lodestar#6037. As Lighthouse is the only client right now as far as I am aware that does anything in that direction, it would be interesting to get your thoughts on this.

michaelsproul added the val-client Relates to the validator client binary label Aug 4, 2022

michaelsproul changed the title ~~Time discrepancy false positive during doppelganger listening phase~~ Time discrepancy false positive with Teku Jun 22, 2023

nflaig mentioned this issue Jun 23, 2023

Lighthouse VC to Lodestar BN time discrepancy errors ChainSafe/lodestar#5555

Open

nflaig mentioned this issue Oct 16, 2023

Detect and warn about clock drift ChainSafe/lodestar#6037

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time discrepancy false positive with Teku #3421

Time discrepancy false positive with Teku #3421

torfbolt commented Aug 3, 2022

michaelsproul commented Aug 4, 2022

torfbolt commented Aug 4, 2022

michaelsproul commented Aug 15, 2022

nflaig commented Oct 16, 2023 •

edited

Loading

Time discrepancy false positive with Teku #3421

Time discrepancy false positive with Teku #3421

Comments

torfbolt commented Aug 3, 2022

Description

Version

Present Behaviour

Expected Behaviour

Steps to resolve

michaelsproul commented Aug 4, 2022

torfbolt commented Aug 4, 2022

michaelsproul commented Aug 15, 2022

nflaig commented Oct 16, 2023 • edited Loading

nflaig commented Oct 16, 2023 •

edited

Loading