Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time discrepancy false positive with Teku #3421

Open
torfbolt opened this issue Aug 3, 2022 · 4 comments
Open

Time discrepancy false positive with Teku #3421

torfbolt opened this issue Aug 3, 2022 · 4 comments
Labels
val-client Relates to the validator client binary

Comments

@torfbolt
Copy link

torfbolt commented Aug 3, 2022

Description

During the doppelganger protection phase I received multiple error messages from the VC warning about time discrepancy, even though both VC and BN run on the same machine with correct NTP time.

Version

2.5.1 running on Prater

Present Behaviour

After upgrading to 2.5.1 and restarting, the Lighthouse VC went into the doppelganger listening phase. It successfully connected to both configured beacon nodes, a Teku 22.8.0 and a Lighthouse 2.5.1 running on the same machine, fully synced. During the doppelganger protection phase I then received multiple times the log message

Aug 03 18:01:47 schnecki lighthouse[831307]: Aug 03 18:01:47.087 INFO Connected to beacon node endpoint: http://localhost:33050/, version: teku/v22.8.0/linux-x86_64/-privatebuild-ope>
Aug 03 18:01:47 schnecki lighthouse[831307]: Aug 03 18:01:47.208 ERRO Time discrepancy with beacon node endpoint: http://localhost:33050/, local_slot: 3586808, beacon_node_slot: 3586804, msg:>
Aug 03 18:01:48 schnecki lighthouse[831307]: Aug 03 18:01:48.207 INFO Connected to beacon node endpoint: http://localhost:33050/, version: teku/v22.8.0/linux-x86_64/-privatebuild-ope>
Aug 03 18:01:48 schnecki lighthouse[831307]: Aug 03 18:01:48.208 ERRO Time discrepancy with beacon node endpoint: http://localhost:33050/, local_slot: 3586809, beacon_node_slot: 3586804, msg:>
Aug 03 18:01:54 schnecki lighthouse[831307]: Aug 03 18:01:54.002 INFO Connected to beacon node(s) synced: 2, available: 2, total: 2, service: notifier

There were no error messages about the Lighthouse BN, but that might be only because it was listed as second prio BN.

After the doppelganger protection phase ended, there were no further time discrepancy errors.

Expected Behaviour

No error messages about time discrepancy also during the doppelganger protection phase.

Steps to resolve

@michaelsproul
Copy link
Member

Have you inquired with Teku about this? It looks like it might be a discrepancy in how we define the sync_distance in the /eth/v1/node/syncing endpoint. Lighthouse tries to use the sync_distance to work out the remote beacon node's current slot here:

if let Some(local_slot) = slot_clock.now() {
let remote_slot = resp.data.head_slot + resp.data.sync_distance;
if remote_slot + 1 < local_slot || local_slot + 1 < remote_slot {
error!(
log,
"Time discrepancy with beacon node";
"msg" => "check the system time on this host and the beacon node",
"beacon_node_slot" => remote_slot,
"local_slot" => local_slot,
"endpoint" => %beacon_node,
);
}
}
}

@michaelsproul michaelsproul added the val-client Relates to the validator client binary label Aug 4, 2022
@torfbolt
Copy link
Author

torfbolt commented Aug 4, 2022

I think I found out when this happens: if there are two or more consecutive empty slots on the beacon chain. So it looks like Teku doesn't update the head slot (or sync distance) when there are empty slots, while lighthouse expects it to go up.

Teku logs on Prater:

Aug 04 20:16:52 schnecki teku[831696]: 20:16:52.272 INFO - Slot Event *** Slot: 3594684, Block: 57c62b660692653c38f26933642565cdf3b2d2173e6af12993af1431b723a9db, Justified: 112332, Finalized: 112>
Aug 04 20:17:04 schnecki teku[831696]: 20:17:04.346 INFO - Slot Event *** Slot: 3594685, Block: ... empty, Justified: 112332, Finalized: 112>
Aug 04 20:17:16 schnecki teku[831696]: 20:17:16.267 INFO - Slot Event *** Slot: 3594686, Block: ... empty, Justified: 112332, Finalized: 112>
Aug 04 20:17:28 schnecki teku[831696]: 20:17:28.276 INFO - Slot Event *** Slot: 3594687, Block: ... empty, Justified: 112332, Finalized: 112>
Aug 04 20:17:39 schnecki teku[831696]: 20:17:39.898 INFO - Epoch Event *** Epoch: 112334, Justified checkpoint: 112333, Finalized checkpoint: 112332, Finalized root: d543a170bf7255ea9a74f719f692ea>

Lighthouse VC logs:

Aug 04 20:17:23 schnecki lighthouse[901917]: Aug 04 20:17:23.001 INFO Connected to beacon node endpoint: http://localhost:33050/, version: teku/v22.8.0/linux-x86_64/-privatebuild-ope>
Aug 04 20:17:23 schnecki lighthouse[901917]: Aug 04 20:17:23.001 ERRO Time discrepancy with beacon node endpoint: http://localhost:33050/, local_slot: 3594686, beacon_node_slot: 3594684, msg:>
Aug 04 20:17:30 schnecki lighthouse[901917]: Aug 04 20:17:30.000 INFO Connected to beacon node(s) synced: 2, available: 2, total: 2, service: notifier
Aug 04 20:17:30 schnecki lighthouse[901917]: Aug 04 20:17:30.000 INFO Listening for doppelgangers doppelganger_detecting_validators: 6, service: notifier
Aug 04 20:17:30 schnecki lighthouse[901917]: Aug 04 20:17:30.000 INFO Awaiting activation slot: 3594687, epoch: 112333, validators: 6, service: notifier
Aug 04 20:17:35 schnecki lighthouse[901917]: Aug 04 20:17:35.001 INFO Connected to beacon node endpoint: http://localhost:33050/, version: teku/v22.8.0/linux-x86_64/-privatebuild-ope>
Aug 04 20:17:35 schnecki lighthouse[901917]: Aug 04 20:17:35.001 ERRO Time discrepancy with beacon node endpoint: http://localhost:33050/, local_slot: 3594687, beacon_node_slot: 3594684, msg:>
Aug 04 20:17:42 schnecki lighthouse[901917]: Aug 04 20:17:42.000 INFO Connected to beacon node(s) synced: 2, available: 2, total: 2, service: notifier
Aug 04 20:17:42 schnecki lighthouse[901917]: Aug 04 20:17:42.000 INFO Listening for doppelgangers doppelganger_detecting_validators: 6, service: notifier

What I haven't found out yet is why this error only appears during the doppelganger protection phase.

@michaelsproul
Copy link
Member

What I haven't found out yet is why this error only appears during the doppelganger protection phase.

I've seen several reports now of users seeing this even when they're not using doppelganger protection. I think your diagnosis about the head slot behaviour must be correct.

@michaelsproul michaelsproul changed the title Time discrepancy false positive during doppelganger listening phase Time discrepancy false positive with Teku Jun 22, 2023
@nflaig
Copy link
Contributor

nflaig commented Oct 16, 2023

False positives can also be observed with Lodestar ChainSafe/lodestar#5555. As outlined already, there seems to be a different interpretation of what value should be used for sync_distance if the node is synced.

Lighthouse always does the calculation based on current / clock slot

let sync_distance = current_slot - head_slot;

Whereas Lodestar does not consider the current / clock slot at all when node is synced (only when syncing), hence it can't be used to detect a time discrepancy. The sync_distance is always set to 0 and the head slot is set to the head of fork choice.

This breaks the assumptions Lighthouse has when determining a time discrepancy as missed blocks would set the head behind while sync distance is still 0 as mentioned in #3421 (comment). For Lodestar, the time discrepancy error can already be observed with a single missed block as by the time Lighthouse calls syncing status API the head might not yet be updated resulting in a two (2) slot difference.

According to the beacon spec, the Lodestar implementation seems correct

sync_distance:
    allOf:
     - description: "How many slots node needs to process to reach head. 0 if synced."
     - $ref: "../../beacon-node-oapi.yaml#/components/schemas/Uint64"

But I am not sure how accurate those property descriptions are, for head_slot it says "Head slot node is trying to reach" which sounds more like current / clock slot and not the head slot based on fork choice head. I think it would be a good thing if we could use the syncing status API to detect clock drifts on either the validator client or beacon node.

Clock drifts have been a relatively frequent issue, especially for solo stakers and I would like to see a solution for this on Lodestar to warn users about it ChainSafe/lodestar#6037. As Lighthouse is the only client right now as far as I am aware that does anything in that direction, it would be interesting to get your thoughts on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
val-client Relates to the validator client binary
Projects
None yet
Development

No branches or pull requests

3 participants