Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timed out waiting for keepalive pong error never closes the connection #1527

Open
go-- opened this issue Oct 22, 2024 · 4 comments
Open

timed out waiting for keepalive pong error never closes the connection #1527

go-- opened this issue Oct 22, 2024 · 4 comments

Comments

@go--
Copy link

go-- commented Oct 22, 2024

if client is connected to server, sending and receiving pings.
if you close the wifi or the connection of the client:
server never raises a connectionclosederror or a similar error. even though it seems it is gonna raise one.

async with self.send_context():                                                                                                                                                          
  self.protocol.fail(
    CloseCode.INTERNAL_ERROR,
    "keepalive ping timeout",
  )

until client reconnects again which is not expected.

if I enable to debug logs then it shows that connection is closed but no connection_closed raise.

Oct 22 07:19:55 DEBUG = connection is CLOSING 
Oct 22 07:19:55 DEBUG > EOF 
Oct 22 07:19:55 DEBUG x closing TCP connection 
Oct 22 07:26:55 DEBUG < EOF 
Oct 22 07:26:55 DEBUG = connection is CLOSED
@go-- go-- changed the title timed out waiting for keepalive pong error never closes the connection timed out waiting for keepalive pong error never closes the connection #bug Oct 22, 2024
@go-- go-- changed the title timed out waiting for keepalive pong error never closes the connection #bug timed out waiting for keepalive pong error never closes the connection Oct 22, 2024
@aaugustin
Copy link
Member

It's possible that there's a bug but your report is a bit short of information for me to reproduce easily.

Can you give me a bit more context?

  • Are you using websockets on the client, the server, or both?
  • Where do you expect a ConnectionClosedError? Are you awaiting recv() when this happens?

@go--
Copy link
Author

go-- commented Oct 26, 2024

I think the problem is in the keepalive ping internal procedure. It never raises ConnectionClosed when the client closes the connection.

  • Using the server with the new asyncio API. (Server) (Enable debug logs)
  • Connecting it this from any client which has a different WiFi connection to test ping timeout. (Client)
  1. To reproduce it; close the clients WiFi.
  2. Watch the server logs you should see DEBUG = connection is CLOSED but the client is still alive in the loop. no ConnectionClosed. (If the client closed normally like with ctrl+c it would throw ConnectionClosed)
  3. After awhile like 2 mins later, turn the WiFi on and check the logs again the server now throw the connectionclosed error for the intended client.

hope I am more clear this time.

@aaugustin
Copy link
Member

aaugustin commented Oct 26, 2024

Yes, that's more clear, thank you.

The scenario that you're testing typically results in a stuck TCP connection because the TCP closing handshake cannot complete.

websockets waits for the TCP connection to terminate to raise the ConnectionClosed exception because the details of that exception cannot be determined with certainty until then.

My best guess is that you're seeing in the debug logs of the server:

  • sending a ping frame ---> waiting for a pong frame
  • detecting a ping timeout ping_timeout seconds later
  • sending a close frame ---> waiting for a close frame
  • closing the TCP connection close_timeout seconds later
    • this happens at 07:19:55 in the logs you provided
  • ... nothing here because you've lost the network link so the TCP closing handshake cannot complete and asyncio never gets the connection_lost signal...
  • until you recover the network link, then the TCP closing handshake completes and you get the ConnectionClosed exception
    • this happens at 07:26:55 in the logs you provided

This behavior is documented in close and wait_closed.

Can you confirm that's what you're seeing? You need to go a bit further back in logs vs. what you provided.

Once we agree on the facts, we can review whether that behavior makes sense. For context, it's the consequence of paying attention not to leak TCP connections.

@go--
Copy link
Author

go-- commented Oct 26, 2024

yes it was almost exactly like this.

what I cannot recall right now is I thought I saw the log lines I sent was a "whole at once". until you pointed out there is a 5 mins gap. I wont be able to reproduce this right now. I will check again after the holiday.
What I would have expected was:
After timeout out seconds (30 in my case) a ConnectionClosed or TimedOut error.
Because if you disable the logs. you wont be seeing anything. and no guarantee that the intended client is turning back online. then we would treat the lost connection client as connected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants