fix: attempt resume on websocket closure with `close_code = 1000` in edge cases #1241

shiftinv · 2024-11-14T16:36:22Z

Summary

This should hopefully fix a spurious websocket quirk introduced in aiohttp 3.9.0, which results in shards fully reconnecting instead of resuming on abrupt connection loss. See aio-libs/aiohttp#8138 and the comment left in the code for more details.
There are several different workarounds for this, this is one of them. It's not pretty, but likely the most reliable one, besides another one I experimented with a while ago which relied on aiohttp internals too much :/

Only cpython 3.10 and earlier is affected, since 3.11.0a6+ uses uvloop's ssl implementation, which doesn't have this particular issue with ssl transport lifecycles.

Some references:

Reproduction steps (sort of):

run an autosharded client
get the client ws port using bot._get_websocket(shard_id=0).socket.get_extra_info("sockname")
run sudo tcpkill -i <iface> port <port> (simulating the connection drop by inserting a TCP RST)
wait for (or cause) some sort of websocket frame to be sent
notice the shard fully reconnecting instead of resuming

Checklist

If code changes were made, then they have been tested
- I have updated the documentation to reflect the changes
- I have formatted the code properly by running pdm lint
- I have type-checked the code by running pdm pyright
This PR fixes an issue
This PR adds something new (e.g. new method or parameters)
This PR is a breaking change (e.g. methods or parameters removed/renamed)
This PR is not a code change (e.g. documentation, README, ...)

…in special cases

…edge cases (#1241)

shiftinv added 3 commits November 14, 2024 16:49

fix: attempt reconnect on websocket closure with close_code = 1000 …

676a643

…in special cases

docs: add changelog entry

e378820

chore: rephrase the log message

085ad8a

shiftinv added t: bugfix needs backport: 2.9 labels Nov 14, 2024

shiftinv added this to the disnake v2.10 milestone Nov 14, 2024

Merge branch 'master' into fix/websocket-close-code

db16e8e

shiftinv merged commit 1cd840a into master Nov 14, 2024
28 checks passed

shiftinv deleted the fix/websocket-close-code branch November 14, 2024 17:23

shiftinv added a commit that referenced this pull request Nov 14, 2024

fix: attempt resume on websocket closure with close_code = 1000 in …

ee5790c

…edge cases (#1241)

shiftinv removed the needs backport: 2.9 label Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: attempt resume on websocket closure with `close_code = 1000` in edge cases #1241

fix: attempt resume on websocket closure with `close_code = 1000` in edge cases #1241

shiftinv commented Nov 14, 2024

fix: attempt resume on websocket closure with close_code = 1000 in edge cases #1241

fix: attempt resume on websocket closure with close_code = 1000 in edge cases #1241

Conversation

shiftinv commented Nov 14, 2024

Summary

Reproduction steps (sort of):

Checklist

fix: attempt resume on websocket closure with `close_code = 1000` in edge cases #1241

fix: attempt resume on websocket closure with `close_code = 1000` in edge cases #1241