-
-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Outgoing request error rate #4661
Comments
checking |
Documenting the investigation so far: log of a peer id where this error was thrown id: Secp256k1PeerIdImpl [PeerId(16Uiu2HAmVPLt1XZLrafBnTK3i1sKf8qQ5zYTokQyNEm4R581Bbnq)] {
type: 'secp256k1',
multihash: Digest {
code: 0,
size: 37,
digest: <Buffer 08 02 12 21 03 f8 94 99 06 b1 3b b6 3c 15 b8 01 7b 25 b5 10 ea d5 43 1b 8a 9d 11 c2 de 6a 7b 76 de 66 21 b2 1a>,
bytes: [Uint8Array]
},
privateKey: undefined,
publicKey: <Buffer 08 02 12 21 03 f8 94 99 06 b1 3b b6 3c 15 b8 01 7b 25 b5 10 ea d5 43 1b 8a 9d 11 c2 de 6a 7b 76 de 66 21 b2 1a>
},
multiaddrs: [],
dialTarget: {
id: '16Uiu2HAmVPLt1XZLrafBnTK3i1sKf8qQ5zYTokQyNEm4R581Bbnq',
addrs: []
}
} clearly this peerId didn't have gave any multiaddrs and hence there were no addrs to dail, but on a fetch of the info of the peer via api:
The IP doesn't seem to belong to any of lodestar deployments, so unlikely that #4658 is the cause. But will observe and see if there is any coherent pattern that can be observed on this error. |
yeah this is the common issue since we upgrade libp2p, same to v1.2.0-rc.1 #4660 (comment) |
this still happens in v1.2.0 I think the reason is we received many goodbye requests, we also sent out many goodbye requests which lead to 20% Dial error rate
sample log
|
This happens with v1.2.1 in mainnet nodes. It happens to a single peer really have no multiaddr somehow, every 10 seconds we got 2 "REQUEST_ERROR_DIAL_ERROR" errors with status request and ping request, the metrics say that's low tolerance error so we should disconnect the peer due to low score but somehow we did not. |
Describe the bug
Starting from Oct 14, the "Outgoing request error rate" metric spiked on
unstable
There are logs like:
@g11tech this could be related to #4658 as I don't see other commits around this time
Expected behavior
No error
The text was updated successfully, but these errors were encountered: