-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heartbeat detection is delayed when timers throttled, and messages can be sent over expired connection #5135
Comments
Hi! That's an interesting idea, thanks for the detailed analysis 👍 However, I'm wondering whether handling such a corner case is really worth it. I don't think we will be able to handle all cases where the client fails to send a packet. If one really want a packet to be received by the server, aren't Reference: https://socket.io/docs/v4/client-options/#retries |
Hey @darrachequesne The problem with retries is that it requires that you wait for the ack, and also with retrying you need to make sure you either are ok processing duplicate messages or are including some kind of identifier to make sure you can handle duplicate messages on the server, which is not trivial. I.e. the connection might not have already been dead, it might have died after the message was sent but before the response had chance to come back. I think the goal of the heartbeat logic is to detect when a socket is broken before a real message goes down it? So this does feel like a problem with that to me. Adding retries feels like it's working around a bug in that system. I understand without retries you're getting "at most once delivery" and can expect messages to get lost, but if we improve the heartbeat logic this should be a lot more reliable. Before we implemented a workaround in our app (which I'm hoping we can remove if we can get a solution in socket.io at some point), we had reports that people would open there laptop and return to our app, then try and perform an action, which would hang indefinitely. I think we shouldn't need to require users to move from "at most once" to "at least once" delivery to solve this problem. This is the workaround we have for now: // these need to match the config on the server
const pingInterval = 25000;
const pingTimeout = 20000;
function detectSocketDisconnection(socket) {
let lastPingTime = null
const onPing = () => {
lastPingTime = Date.now()
}
const checkForDisconnection = () => {
if (lastPingTime === null) return
const connectionExpired =
Date.now() - lastPingTime > pingInterval + pingTimeout
if (connectionExpired) {
console.log('Forcing disconnection...')
lastPingTime = null
socket.io.engine.close()
}
}
const originalEmit = socket.emit
socket.emit = (...args) => {
// This may cause a disconnect, which will then cause the actual emit message to be
// buffered and sent down the new connection when it's set up
checkForDisconnection()
return originalEmit.apply(socket, args)
}
setInterval(() => checkForDisconnection(), 5000)
window.addEventListener('visibilitychange', () => {
if (document.visibilityState === 'visible') {
checkForDisconnection()
}
})
socket.io.on('ping', onPing)
socket.on('connect', onPing)
}
const socket = io()
detectSocketDisconnection(socket) |
@tjenkinson OK, that totally makes sense 👍 I've added a minor suggestion on the PR: #5134 |
@tjenkinson again, sorry for the delay, I'd really like to get this sorted out. Do you think it would be sufficient to use the |
Hey @darrachequesne sometimes when reopening the laptop I'm not seeing the visibility change fire so not sure how reliable it would be Also when sending a |
And what about the Something like this: function customSetTimeout(callback, delay) {
let called = false, start = Date.now();
const timerId = setInterval(() => {
if (!called && Date.now() - start > delay) {
called = true;
callback();
}
}, Math.floor(delay / 100));
return {
reset() {
called = false;
start = Date.now();
},
clear() {
clearInterval(timerId);
}
}
} |
Hmm it could never be as reliable or as efficient as checking just before sending the message. Set interval is also paused so the interval would need to be quite low and there's still a chance that the app might try and do something before the set interval fires |
When a laptop is suspended or a phone is locked, the timer that is used to check the liveness of the connection is paused and is not able to detect that the heartbeat has failed. Previously, emitting a message after resuming the page would lose the message. The status of the timer will now be checked before sending the message, so that it gets buffered and sent upon reconnection. Note: we could also have used the Page Visibility API or a custom setTimeout() method based on setInterval(), but this would not be as reliable as the current solution. Reference: https://developer.mozilla.org/en-US/docs/Web/API/Page_Visibility_API Related: #5135
This should be fixed by 8adcfbf, included in @tjenkinson thank you for your tenacity 👼 |
Describe the bug
Due to chrome (and maybe other browsers) throttling timers when the device sleeps, the heartbeat detection logic doesn't always run in time, causing messages to be lost before reconnection happens late.
To Reproduce
Socket.IO server version:
4.7.5
Server
Socket.IO client version:
4.7.5
Client
Expected behavior
Proposed fixes
Check if connection is expired before sending every message
I opened a PR for this here: #5134
Let me know what you think
This doesn't cause the reconnect to happen immediately when the page is woken up, but it does mean attempting to send a message should trigger the reconnection, so outgoing messages shouldn't get lost.
Use
setInterval
and periodically check if a ping has not been received instead ofsetTimeout
I think this and storing the
Date.now()
of the last ping should be more accurate, and cause the reconnection to happen almost immediately when the page is woken up. Can also do the same on avisibilitychange
event when the page goes back tovisible
.Can look into a PR for this if you think it makes sense?
Platform:
Additional context
#3507 (comment) is another fix in the past related to throttled timers
The text was updated successfully, but these errors were encountered: