You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Starting with aiohttp 3.9.0, abrupt connection loss in a ClientWebSocketResponse results in close_code = 1000 (OK), whereas it used to return close_code = 1006 (ABNORMAL_CLOSURE) in 3.8.6 and prior.
This only happens with SSL/TLS connections, not plaintext ones (which show 1006 in both versions), and only in Python <= 3.10.
I've bisected this down to #7680 being the first change where this happens, but it might only be an indirect cause. Some more notes at the end.
importsslfromaiohttpimportwebasyncdefhandle_ws(request: web.Request):
ws=web.WebSocketResponse()
awaitws.prepare(request)
# simulate the server connection randomly dropping,# without websocket protocol-level close code (i.e. now just a TCP FIN)ws._writer.transport.close()
ssl_context=ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)
ssl_context.load_cert_chain("cert.pem")
app=web.Application()
app.add_routes([web.get("/ws", handle_ws)])
web.run_app(app, port=8080, ssl_context=ssl_context)
$ python -m pip show yarlName: yarlVersion: 1.5.1Summary: Yet another URL libraryHome-page: https://github.com/aio-libs/yarl/Author: Andrew SvetlovAuthor-email: [email protected]License: Apache 2Location: /home/[...]/.venv/lib/python3.10/site-packagesRequires: idna, multidictRequired-by: aiohttp
OS
Linux
Related component
Client
Additional context
Use case
We use the websocket close code to handle reconnection logic in our library, where these two codes branch into different paths - 1006 means reconnect and try to resume the previous session, while 1000 generally means a full reconnect and discarding the session, taking substantially longer.
Connections to the server run through Cloudflare, which restarts websocket nodes occasionally1. Resuming these sessions is handled at the application level2.
Some investigation
In all cases, when the connection drops, the initial EofStream in receive() ends up setting close_code = 1000.
With non-SSL connections, _SelectorSocketTransport.is_closing() returns True after connection loss. This means that cleanly closing the writer fails, and close() immediately returns after setting close_code = 1006.
Its SSL counterpart _SSLProtocolTransport.is_closing() returns False, so closing the writer seemingly succeeds.
In 3.8, ClientWebSocketResponse.close() now tries to read the remaining messages before returning, which raises another EofStream and ends up setting close_code = 1006.
I hope at least some of this makes sense; I'm not familiar enough with aiohttp internals to fix it myself, unfortunately.
In the end this is arguably something that should be fixed in asyncio, but Python 3.10 is already out of support, and 3.11+ doesn't seem to have this issue anymore given https://bugs.python.org/issue44011.
If this is no longer an issue in Python 3.11+, then I doubt we'll see a fix and you'll be better off figuring out a way to upgrade to a newer Python version. This seems like it'll probably by a hard problem to solve and we have very little time to look at things.
you'll be better off figuring out a way to upgrade to a newer Python version
That's fair. I haven't personally run into this issue in any of my projects yet, as those are primarily running on 3.12 at this point. It was reported to our library by other users, and I'd like to keep things working right out of the box on currently supported versions (3.8-3.12). We haven't yet found a proper workaround (that doesn't involve undocumented fields) other than staying on aiohttp 3.8.6, but I understand if you don't have enough time to look into this.
This seems like it'll probably by a hard problem to solve
Looks like it :/ Just reproducing and narrowing down the original reported issue of essentially "things are reconnecting more frequently" to this took me a couple days.
I was hoping that it could be fixed for client websockets similar to the way it was addressed in aiohttp.web in #7180, but I haven't looked into it deeply enough to judge if that's actually applicable here.
There was a similar report that the close code doesn't match the code sent by the server. It wasn't related to Python versions or SSL, but maybe the fix for that also resolved this issue. If not, as mentioned, I don't think we'll look at it if it only affects old Python versions.
Describe the bug
Starting with aiohttp 3.9.0, abrupt connection loss in a
ClientWebSocketResponse
results inclose_code = 1000 (OK)
, whereas it used to returnclose_code = 1006 (ABNORMAL_CLOSURE)
in 3.8.6 and prior.This only happens with SSL/TLS connections, not plaintext ones (which show
1006
in both versions), and only in Python <= 3.10.I've bisected this down to #7680 being the first change where this happens, but it might only be an indirect cause. Some more notes at the end.
To Reproduce
openssl req -newkey rsa:4096 -x509 -days 365 -nodes -out cert.pem -keyout cert.pem
Expected behavior
Abrupt connection loss should result in
close_code = 1006 (ABNORMAL_CLOSURE)
in both SSL/TLS and plaintext connections.Logs/tracebacks
Python Version
aiohttp Version
multidict Version
yarl Version
OS
Linux
Related component
Client
Additional context
Use case
We use the websocket close code to handle reconnection logic in our library, where these two codes branch into different paths -
1006
means reconnect and try to resume the previous session, while1000
generally means a full reconnect and discarding the session, taking substantially longer.Connections to the server run through Cloudflare, which restarts websocket nodes occasionally1. Resuming these sessions is handled at the application level2.
Some investigation
In all cases, when the connection drops, the initial
EofStream
inreceive()
ends up settingclose_code = 1000
.As far as I can tell, this ultimately comes down to python/cpython#101353 again.
_SelectorSocketTransport.is_closing()
returnsTrue
after connection loss. This means that cleanly closing the writer fails, andclose()
immediately returns after settingclose_code = 1006
._SSLProtocolTransport.is_closing()
returnsFalse
, so closing the writer seemingly succeeds.ClientWebSocketResponse.close()
now tries to read the remaining messages before returning, which raises anotherEofStream
and ends up settingclose_code = 1006
.ClientWebSocketResponse.close()
now returns before that, keeping theclose_code = 1000
set originally.I hope at least some of this makes sense; I'm not familiar enough with aiohttp internals to fix it myself, unfortunately.
In the end this is arguably something that should be fixed in asyncio, but Python 3.10 is already out of support, and 3.11+ doesn't seem to have this issue anymore given https://bugs.python.org/issue44011.
Code of Conduct
Footnotes
https://developers.cloudflare.com/network/websockets/#technical-note ↩
https://discord.com/developers/docs/topics/gateway#resuming ↩
The text was updated successfully, but these errors were encountered: