Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI test failed: Lightclient track head on server configuration #3994

Closed
twoeths opened this issue May 10, 2022 · 3 comments · Fixed by #4006
Closed

CI test failed: Lightclient track head on server configuration #3994

twoeths opened this issue May 10, 2022 · 3 comments · Fixed by #4006
Assignees

Comments

@twoeths
Copy link
Contributor

twoeths commented May 10, 2022

Describe the bug

This test occasionally failed in CI environment https://github.com/ChainSafe/lodestar/runs/6364402693?check_suite_focus=true:

@chainsafe/lodestar:   1 failing
@chainsafe/lodestar:   1) chain / lightclient
@chainsafe/lodestar:        Lightclient track head on server configuration:
@chainsafe/lodestar:      Error: Lightclient head 4 is too far behind the beacon node 9
@chainsafe/lodestar:       at ChainEventEmitter.<anonymous> (test/e2e/chain/lightclient.test.ts:158:21)
@chainsafe/lodestar:       at runMicrotasks (<anonymous>)
@chainsafe/lodestar:       at processTicksAndRejections (node:internal/process/task_queues:96:5)

Expected behavior

It should always pass

@dapplion
Copy link
Contributor

So this test is really a race condition, if the process gets too overloaded it can fail. To reduce the likelyhood of false positives, we can add a retry on the test this.retry(2) or increase the maximum allowed tracking distance. However then we won't capture issues that may delay light-client processing

@twoeths
Copy link
Contributor Author

twoeths commented May 10, 2022

more log

@chainsafe/lodestar: Eph 0/3 0.084[LC]               info: Initializing lightclient slot=3
@chainsafe/lodestar: Eph 0/3 0.103[LC]               info: Initialized lightclient headSlot=3
@chainsafe/lodestar: Eph 0/3 0.103[LC]              debug: Started tracking the head
@chainsafe/lodestar: Eph 0/3 0.500[NODE]             info: Synced - slot: 3 - head: 3 0xd40d…e881 - finalized: 0x0000…0000:0 - peers: 0
@chainsafe/lodestar: Eph 0/4 0.049[LC]               info: Head updated slot=3, root=0xd40d9dcaefa5aafddbe4fecf6c0cb4b0e6586d518505dc8f75f4237ee665e881
@chainsafe/lodestar: Eph 0/4 0.500[NODE]             info: Synced - slot: 4 - head: 4 0xd0fb…fb1d - finalized: 0x0000…0000:0 - peers: 0
@chainsafe/lodestar: Eph 0/5 0.093[LC]               info: Head updated slot=4, root=0xd0fb12a951c43d64d03aaefeaadfce9a4606feedb8e86aff74fd941aa72bfb1d
@chainsafe/lodestar: Eph 0/5 0.501[NODE]             info: Synced - slot: 5 - head: 5 0x2ee3…3c7b - finalized: 0x0000…0000:0 - peers: 0
@chainsafe/lodestar: Eph 0/6 0.059[LC]              error: Error on onSSE  Sync committee has not sufficient participants
@chainsafe/lodestar: Error: Sync committee has not sufficient participants
@chainsafe/lodestar:     at assertValidSignedHeader (/home/runner/work/lodestar/lodestar/packages/light-client/src/validation.ts:152:11)
@chainsafe/lodestar:     at Lightclient.processHeaderUpdate (/home/runner/work/lodestar/lodestar/packages/light-client/src/index.ts:393:28)
@chainsafe/lodestar:     at Lightclient.onSSE (/home/runner/work/lodestar/lodestar/packages/light-client/src/index.ts:356:16)
@chainsafe/lodestar:     at EventSource.<anonymous> (/home/runner/work/lodestar/lodestar/packages/api/src/client/events.ts:24:15)
@chainsafe/lodestar:     at EventSource.emit (node:events:527:28)
@chainsafe/lodestar:     at EventSource.emit (node:domain:475:12)
@chainsafe/lodestar:     at _emit (/home/runner/work/lodestar/lodestar/node_modules/eventsource/lib/eventsource.js:257:17)
@chainsafe/lodestar:     at parseEventStreamLine (/home/runner/work/lodestar/lodestar/node_modules/eventsource/lib/eventsource.js:272:9)
@chainsafe/lodestar:     at IncomingMessage.<anonymous> (/home/runner/work/lodestar/lodestar/node_modules/eventsource/lib/eventsource.js:231:11)
@chainsafe/lodestar:     at IncomingMessage.emit (node:events:527:28)
@chainsafe/lodestar:     at IncomingMessage.emit (node:domain:475:12)
@chainsafe/lodestar:     at addChunk (node:internal/streams/readable:315:12)
@chainsafe/lodestar:     at readableAddChunk (node:internal/streams/readable:289:9)
@chainsafe/lodestar:     at IncomingMessage.Readable.push (node:internal/streams/readable:228:10)
@chainsafe/lodestar:     at HTTPParser.parserOnBody (node:_http_common:141:24)
@chainsafe/lodestar:     at Socket.socketOnData (node:_http_client:494:22)
@chainsafe/lodestar:     at Socket.emit (node:events:527:28)
@chainsafe/lodestar:     at Socket.emit (node:domain:475:12)
@chainsafe/lodestar:     at addChunk (node:internal/streams/readable:315:12)
@chainsafe/lodestar:     at readableAddChunk (node:internal/streams/readable:289:9)
@chainsafe/lodestar:     at Socket.Readable.push (node:internal/streams/readable:228:10)
@chainsafe/lodestar:     at TCP.onStreamRead (node:internal/stream_base_commons:190:23)

and this error happened until slot 9, then it throwed error. The good log from other CI run:

@chainsafe/lodestar: Eph 0/3 0.053[LC]               info: Initializing lightclient slot=3
@chainsafe/lodestar: Eph 0/3 0.086[LC]               info: Initialized lightclient headSlot=3
@chainsafe/lodestar: Eph 0/3 0.086[LC]              debug: Started tracking the head
@chainsafe/lodestar: Eph 0/3 0.501[NODE]             info: Synced - slot: 3 - head: 3 0x0902…ad66 - finalized: 0x0000…0000:0 - peers: 0
@chainsafe/lodestar: Eph 0/4 0.054[LC]               info: Head updated slot=3, root=0x0902c25d23b953bfcf94f09c8d2ecc386012282ced7b881ac808e9bafcc4ad66
@chainsafe/lodestar: Eph 0/4 0.500[NODE]             info: Synced - slot: 4 - head: 4 0x35af…8246 - finalized: 0x0000…0000:0 - peers: 0
@chainsafe/lodestar: Eph 0/5 0.063[LC]               info: Head updated slot=4, root=0x35afe58b4d5fba032a8e71e8d91bee37ca6db16bf1beeaf0719ca704888f8246
@chainsafe/lodestar: Eph 0/5 0.500[NODE]             info: Synced - slot: 5 - head: 5 0xc798…021a - finalized: 0x0000…0000:0 - peers: 0
@chainsafe/lodestar: Eph 0/6 0.052[LC]               info: Head updated slot=5, root=0xc79861289937d776f8f31a62faaa3a8c7e0b97af816c54ea1dfa748a9975021a
@chainsafe/lodestar: Eph 0/6 0.501[NODE]             info: Synced - slot: 6 - head: 6 0x0526…78b1 - finalized: 0x0000…0000:0 - peers: 0
@chainsafe/lodestar: Eph 0/7 0.064[LC]               info: Head updated slot=6, root=0x0526164253d1add9e4bc431fc4ebd196909b4bbc97645f5c7c3ec264ebcf78b1
@chainsafe/lodestar: Eph 0/7 0.500[NODE]             info: Synced - slot: 7 - head: 7 0x6632…5bd8 - finalized: 0x0000…0000:0 - peers: 0
@chainsafe/lodestar: Eph 1/0 0.060[LC]               info: Head updated slot=7, root=0x6632dcac858fc6fb6abe01c706afa399c11829c1e9fe96cd93e3971dafb05bd8
@chainsafe/lodestar: Eph 1/0 0.500[NODE]             info: Synced - slot: 8 - head: 8 0xcff1…391f - finalized: 0x0000…0000:0 - peers: 0
@chainsafe/lodestar: Eph 1/1 0.052[LC]               info: Head updated slot=8, root=0xcff1a65950fefd64b0cc68c863865d14b93b8706356ff41e6bf65c503d4f391f
@chainsafe/lodestar: Eph 1/1 0.501[NODE]             info: Synced - slot: 9 - head: 9 0xaa7e…70b2 - finalized: 0x0000…0000:0 - peers: 0
@chainsafe/lodestar: Eph 1/2 0.058[LC]               info: Head updated slot=9, root=0xaa7eff5c21ec5a6c3cd4c886c1de25b764579dbdbefd324cfcda84cd308b70b2
@chainsafe/lodestar: Eph 1/2 0.500[NODE]             info: Synced - slot: 10 - head: 10 0xc09a…88c5 - finalized: 0x0000…0000:0 - peers: 0
@chainsafe/lodestar: Eph 1/3 0.053[LC]               info: Head updated slot=10, root=0xc09a0da2bb2d5409203bcaa53636a4faa83946384cae19d031ee9fa171e488c5
.....
.....
@chainsafe/lodestar: Eph 15/6 0.501[NODE]             info: Synced - slot: 126 - head: 126 0x7b88…6515 - finalized: 0x68b9…9e32:13 - peers: 0
@chainsafe/lodestar: Eph 15/7 0.049[LC]               info: Head updated slot=126, root=0x7b88e2da4afc625a7e92127088cc9b989f3487022584d61b7b5aa0940d8b6515
@chainsafe/lodestar: Eph 15/7 0.500[NODE]             info: Synced - slot: 127 - head: 127 0x4d70…46b3 - finalized: 0x68b9…9e32:13 - peers: 0
@chainsafe/lodestar: Eph 15/7 0.702[NODE]             info: Node A emitted finalized checkpoint event epoch=14
@chainsafe/lodestar: Eph 16/0 0.002[LC]              debug: Syncing lastPeriod=1, currentPeriod=2
...

so we should look into why there is not enough sync committee participations at some slots

@twoeths twoeths self-assigned this May 11, 2022
@twoeths
Copy link
Contributor Author

twoeths commented May 11, 2022

since the root cause is "Sync committee has not sufficient participants", we can increase the validator to hopefully mitigate the issue. Right now there are only 8 validators and all of them join the sync committee (note that the SYNC_COMMITTEE_SIZE in minimal config is 32)

Attaching the log for later reference
lightclient_e2e_20220511.txt.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants