Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abrupt client websocket connection loss results in close_code = OK #8138

Closed
1 task done
shiftinv opened this issue Feb 6, 2024 · 3 comments
Closed
1 task done

Abrupt client websocket connection loss results in close_code = OK #8138

shiftinv opened this issue Feb 6, 2024 · 3 comments
Labels

Comments

@shiftinv
Copy link

shiftinv commented Feb 6, 2024

Describe the bug

Starting with aiohttp 3.9.0, abrupt connection loss in a ClientWebSocketResponse results in close_code = 1000 (OK), whereas it used to return close_code = 1006 (ABNORMAL_CLOSURE) in 3.8.6 and prior.
This only happens with SSL/TLS connections, not plaintext ones (which show 1006 in both versions), and only in Python <= 3.10.
I've bisected this down to #7680 being the first change where this happens, but it might only be an indirect cause. Some more notes at the end.

To Reproduce

  1. Create a cert/key pair: openssl req -newkey rsa:4096 -x509 -days 365 -nodes -out cert.pem -keyout cert.pem
  2. server.py
    import ssl
    from aiohttp import web
    
    async def handle_ws(request: web.Request):
        ws = web.WebSocketResponse()
        await ws.prepare(request)
        # simulate the server connection randomly dropping,
        # without websocket protocol-level close code (i.e. now just a TCP FIN)
        ws._writer.transport.close()
    
    ssl_context = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)
    ssl_context.load_cert_chain("cert.pem")
    
    app = web.Application()
    app.add_routes([web.get("/ws", handle_ws)])
    web.run_app(app, port=8080, ssl_context=ssl_context)
  3. client.py
    import asyncio
    import ssl
    import aiohttp
    
    async def main():
        ssl_context = ssl.create_default_context(cafile="cert.pem")
        ssl_context.check_hostname = False
        async with aiohttp.ClientSession(connector=aiohttp.TCPConnector(ssl=ssl_context)) as session:
            async with session.ws_connect("wss://localhost:8080/ws") as ws:
                msg = await ws.receive()
                print(msg)
                print(repr(ws.close_code))
    
    print(aiohttp.__version__)
    asyncio.run(main())

Expected behavior

Abrupt connection loss should result in close_code = 1006 (ABNORMAL_CLOSURE) in both SSL/TLS and plaintext connections.

Logs/tracebacks

$ python client.py
3.8.6
WSMessage(type=<WSMsgType.CLOSED: 257>, data=None, extra=None)
<WSCloseCode.ABNORMAL_CLOSURE: 1006>

# after updating to aiohttp 3.9
$ python client.py
3.9.3
WSMessage(type=<WSMsgType.CLOSED: 257>, data=None, extra=None)
<WSCloseCode.OK: 1000>

Python Version

$ python --version
Python 3.10.12

aiohttp Version

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.9.3
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: 
Author-email: 
License: Apache 2
Location: /home/[...]/.venv/lib/python3.10/site-packages
Requires: aiosignal, async-timeout, attrs, frozenlist, multidict, yarl
Required-by:

multidict Version

$ python -m pip show multidict
Name: multidict
Version: 4.7.6
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /home/[...]/.venv/lib/python3.10/site-packages
Requires: 
Required-by: aiohttp, yarl

yarl Version

$ python -m pip show yarl
Name: yarl
Version: 1.5.1
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /home/[...]/.venv/lib/python3.10/site-packages
Requires: idna, multidict
Required-by: aiohttp

OS

Linux

Related component

Client

Additional context

Use case

We use the websocket close code to handle reconnection logic in our library, where these two codes branch into different paths - 1006 means reconnect and try to resume the previous session, while 1000 generally means a full reconnect and discarding the session, taking substantially longer.
Connections to the server run through Cloudflare, which restarts websocket nodes occasionally1. Resuming these sessions is handled at the application level2.

Some investigation

In all cases, when the connection drops, the initial EofStream in receive() ends up setting close_code = 1000.

As far as I can tell, this ultimately comes down to python/cpython#101353 again.

I hope at least some of this makes sense; I'm not familiar enough with aiohttp internals to fix it myself, unfortunately.
In the end this is arguably something that should be fixed in asyncio, but Python 3.10 is already out of support, and 3.11+ doesn't seem to have this issue anymore given https://bugs.python.org/issue44011.

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct

Footnotes

  1. https://developers.cloudflare.com/network/websockets/#technical-note

  2. https://discord.com/developers/docs/topics/gateway#resuming

@shiftinv shiftinv added the bug label Feb 6, 2024
@Dreamsorcerer
Copy link
Member

3.11+ doesn't seem to have this issue anymore

If this is no longer an issue in Python 3.11+, then I doubt we'll see a fix and you'll be better off figuring out a way to upgrade to a newer Python version. This seems like it'll probably by a hard problem to solve and we have very little time to look at things.

@shiftinv
Copy link
Author

shiftinv commented Feb 8, 2024

you'll be better off figuring out a way to upgrade to a newer Python version

That's fair. I haven't personally run into this issue in any of my projects yet, as those are primarily running on 3.12 at this point. It was reported to our library by other users, and I'd like to keep things working right out of the box on currently supported versions (3.8-3.12). We haven't yet found a proper workaround (that doesn't involve undocumented fields) other than staying on aiohttp 3.8.6, but I understand if you don't have enough time to look into this.

This seems like it'll probably by a hard problem to solve

Looks like it :/ Just reproducing and narrowing down the original reported issue of essentially "things are reconnecting more frequently" to this took me a couple days.
I was hoping that it could be fixed for client websockets similar to the way it was addressed in aiohttp.web in #7180, but I haven't looked into it deeply enough to judge if that's actually applicable here.

@Dreamsorcerer
Copy link
Member

There was a similar report that the close code doesn't match the code sent by the server. It wasn't related to Python versions or SSL, but maybe the fix for that also resolved this issue. If not, as mentioned, I don't think we'll look at it if it only affects old Python versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants