Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Websocket client hanging/blocking? #3291

Closed
bradwood opened this issue Sep 26, 2018 · 13 comments
Closed

Websocket client hanging/blocking? #3291

bradwood opened this issue Sep 26, 2018 · 13 comments
Labels
bug needs-info Issue is lacking sufficient information and will be closed if not provided Stale

Comments

@bradwood
Copy link

bradwood commented Sep 26, 2018

Long story short

ws.receive() appears to be hanging despite data flowing from websocket.

Expected behaviour

ws.receive should output and report something.

Actual behaviour

it just hangs/blocks on the websocket.

Steps to reproduce

The code below is the calling code.

    async def _ws_subscribe(self,
                            session: ClientSession,
                            url: str
                            ) -> Dict:
        """ Fetch data from web socket asynchronously.

        This helper method fetches data from a web socket asynchronously. It is used to
        fetch subscribe to websockets of the SkyQ box.

        Args:
            session (aiohttp.ClientSession): Session to use when fetching the data.
            url (str): WebSocket URL to fetch.

        Returns:
            dict: The body of data returned.

        """
        LOGGER.debug(f"Inside _ws_subscribe()...")

        async with session.ws_connect(url, autoclose=False) as ws:
            while True:
                LOGGER.debug(f"Inside _ws_subscribe() infinite loop pre ws.receive.")
                payload = await ws.receive()
                LOGGER.debug(f"READ SOMETHING! _ws_subscribe() infinite loop post ws.receive")
                LOGGER.debug(f'type = {payload.type}')
                LOGGER.debug(f'payload data = {payload.data}')
                LOGGER.debug(f'payload exception = {ws.exception()}')
                if payload.type == aiohttp.WSMsgType.TEXT:
                    LOGGER.debug('Web-socket data received.')
                    asyncio.create_task(self._handle(payload))
                    # asyncio.ensure_future(self._handle(payload))
                elif payload.type == aiohttp.WSMsgType.BINARY:
                    LOGGER.debug('Web-socket binary data received.')
                elif payload.type == aiohttp.WSMsgType.PING:
                    LOGGER.debug('Web-socket ping received')
                    ws.pong()
                elif payload.type == aiohttp.WSMsgType.PONG:
                    LOGGER.debug('Web-socket pong received')
                else:
                    if payload.type == aiohttp.WSMsgType.CLOSE:
                        await ws.close()
                    elif payload.type == aiohttp.WSMsgType.ERROR:
                        LOGGER.info(f'Error during receive {ws.exception()}')
                    elif payload.type == aiohttp.WSMsgType.CLOSED:
                        pass

                    break

Resulting log:

(pyskyq-4vSEKDfZ) ✔ [brad@bradmac:~/Code/pyskyq] [16-create-status-end-point-property|✚ 5…2] $ pyskyq -vv green
[2018-09-26 09:12:14] DEBUG:pyskyq.cli:Starting SkyQ...
[2018-09-26 09:12:14] DEBUG:pyskyq.skyremote:Initialised SkyRemote object with host=skyq, port=49160
[2018-09-26 09:12:14] DEBUG:pyskyq.status:Initialised Status object object with host=skyq, port=9006
[2018-09-26 09:12:14] DEBUG:asyncio:Using selector: KqueueSelector
[2018-09-26 09:12:14] DEBUG:pyskyq.status:Asyncio event loop thread running...
[2018-09-26 09:12:14] DEBUG:pyskyq.skyq:Initialised SkyQ object with host=skyq.
[2018-09-26 09:12:14] INFO:pyskyq.cli:Script ends here
[2018-09-26 09:12:14] DEBUG:pyskyq.status:Setting up web socket listener on ws://skyq:9006/as/system/status.
[2018-09-26 09:12:14] DEBUG:pyskyq.status:Inside _ws_subscribe()...
[2018-09-26 09:12:14] DEBUG:asyncio:Get address info skyq:9006, type=<SocketKind.SOCK_STREAM: 1>
[2018-09-26 09:12:14] DEBUG:asyncio:Getting address info skyq:9006, type=<SocketKind.SOCK_STREAM: 1> took 15.352ms: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('10.0.1.6', 9006))]
[2018-09-26 09:12:14] DEBUG:asyncio:poll 60478.847 ms took 14.299 ms: 1 events
[2018-09-26 09:12:14] DEBUG:asyncio:connect <socket.socket fd=8, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('0.0.0.0', 0)> to ('10.0.1.6', 9006)
[2018-09-26 09:12:14] DEBUG:asyncio:poll 60461.657 ms took 21.045 ms: 1 events
[2018-09-26 09:12:14] DEBUG:asyncio:<socket.socket fd=8, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('10.0.1.30', 50141), raddr=('10.0.1.6', 9006)> connected to 10.0.1.6:9006: (<_SelectorSocketTransport fd=8 read=polling write=<idle, bufsize=0>>, <aiohttp.client_proto.ResponseHandler object at 0x1033c8ac8>)
[2018-09-26 09:12:14] DEBUG:asyncio:poll 60429.970 ms took 7.192 ms: 1 events
[2018-09-26 09:12:14] DEBUG:pyskyq.status:Inside _ws_subscribe() infinite loop pre ws.receive.

However, the socket is open and serving data as can be seen by netcat.

(pyskyq-4vSEKDfZ) ✔ [brad@bradmac:~/Code/pyskyq] [16-create-status-end-point-property|✚ 5…2] $ nc skyq 9006
GET /as/system/status HTTP/1.1
Host: skyq:9006
Accept: */*
Connection: Upgrade
Upgrade: websocket

�~�{
   "camessage" : {
      "reason" : "no message",
      "state" : "unavailable"
   },
   "drmstatus" : {
      "state" : "available"
   },
   "entitlements" : [
      "ANALYTICS",
      "BIGBASIC",
      "ETHAN_APP_1",
      "HD",
      "PDL",
      "SKY_DRM_CE",
      "SKY_DRM_MR",
      "SKY_IPPV",
      "ULTRA+",
      "SKY+",
      "GATEWAYENABLER",
      "SIDELOAD"
   ],
   "epginfobits" : {
      "epginfobits" : "0xFDE5FFC0",
      "mask" : "0x5FFA003F",
      "state" : "available"
   },
   "gatewayservices" : {
      "state" : "available"
   },
   "hdmi" : {
      "2160p10bitCapable" : false,
      "authenticatedHDCP" : "1.x",
      "sinkHDCP" : "1.x",
      "sinkHLG" : false,
      "sinkUHD" : false,
      "state" : "available",
      "uhdConfigured" : false
   },
   "network" : {
      "state" : "available"
   },
   "nssplayback" : {
      "state" : "available"
   },
   "pvr" : {
      "state" : "available"
   },
   "schedule" : {
      "lastdate" : "20181003",
      "state" : "available"
   },
   "servicelist" : {
      "state" : "available"
   },
   "smartcard" : {
      "active" : true,
      "bouquet" : "4101",
      "countryCode" : "GBR",
      "currency" : "GBP",
      "cwe" : true,
      "householdid" : "10947783",
      "paired" : true,
      "state" : "available",
      "subbouquet" : "1",
      "transactionlimit" : 65535,
      "viewingCardNumber" : "725 325 260"
   },
   "swupdate" : {
      "reason" : "IDLE",
      "state" : "unavailable"
   },
   "systemupdates" : {
      "entitlements" : 2,
      "install" : 1,
      "servicegenres" : 1,
      "smartcard" : 1
   },
   "updatetask" : {
      "reason" : "no update",
      "state" : "unavailable"
   }
}
^C(pyskyq-4vSEKDfZ) ✔ [brad@bradmac:~/Code/pyskyq] [16-create-status-end-point-property|✚ 5…2] $

Your environment

(pyskyq-4vSEKDfZ) ✔ [brad@bradmac:~/Code/pyskyq] [16-create-status-end-point-property|✚ 5…2] $ uname -a
Darwin bradmac 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64
(pyskyq-4vSEKDfZ) ✔ [brad@bradmac:~/Code/pyskyq] [16-create-status-end-point-property|✚ 5…2] $ pip freeze
aiohttp==3.4.4
appnope==0.1.0
arrow==0.12.1
async-timeout==3.0.0
async-upnp-client==0.12.4
attrs==18.2.0
backcall==0.1.0
binaryornot==0.4.4
bleach==2.1.3
certifi==2018.4.16
chardet==3.0.4
click==6.7
cookiecutter==1.6.0
cycler==0.10.0
decorator==4.3.0
Django==2.1.1
entrypoints==0.2.3
ez-setup==0.9
future==0.16.0
html5lib==1.0.1
idna==2.7
idna-ssl==1.1.0
ipykernel==4.8.2
ipython==6.4.0
ipython-genutils==0.2.0
ipywidgets==7.2.1
jedi==0.12.0
Jinja2==2.10
jinja2-time==0.2.0
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.2.3
jupyter-console==5.2.0
jupyter-core==4.4.0
kaggle==1.4.2
kiwisolver==1.0.1
MarkupSafe==1.0
matplotlib==2.2.2
mistune==0.8.3
multidict==4.4.2
nbconvert==5.3.1
nbformat==4.4.0
notebook==5.5.0
numpy==1.14.4
pandocfilters==1.4.2
parso==0.2.1
pexpect==4.6.0
pickleshare==0.7.4
pipenv==2018.7.1
poyo==0.4.1
prompt-toolkit==1.0.15
ptyprocess==0.6.0
Pygments==2.2.0
pyparsing==2.2.0
PyScaffold==3.0.3
python-dateutil==2.7.3
python-didl-lite==1.1.0
pytz==2018.4
pyzmq==17.0.0
qtconsole==4.3.1
requests==2.19.1
Send2Trash==1.5.0
simplegeneric==0.8.1
six==1.11.0
terminado==0.8.1
testpath==0.3.1
tornado==5.0.2
tqdm==4.24.0
traitlets==4.3.2
urllib3==1.22
virtualenv==16.0.0
virtualenv-clone==0.3.0
voluptuous==0.11.5
wcwidth==0.1.7
webencodings==0.5.1
whichcraft==0.4.1
widgetsnbextension==3.2.1
yarl==1.2.6
(pyskyq-4vSEKDfZ) ✔ [brad@bradmac:~/Code/pyskyq] [16-create-status-end-point-property|✚ 5…2] $

update...

Python version is 3.7.0

@asvetlov
Copy link
Member

GitMate.io thinks possibly related issues are #2200 (background Client websockets hangs on SIGINT), #1002 (Websocket response .close() can hang indefinitely), #376 (ProactorEventLoop hangs ), #3027 (ws_connect hang), and #265 (Implement client-side websocket).

@asvetlov asvetlov added the bug label Sep 26, 2018
@bradwood
Copy link
Author

it looks like client_ws.py#L204 is not returning... any ideas?

The server is a SkyQ set-top box, so might not be entirely RFC-compliance implementation of WS. For example, it doesn't honour ping/poings... And there is that weird �~� in the payload...Not sure if that's an opcode or some other quirk...

@asvetlov
Copy link
Member

Websocket is a binary protocol �~� encodes a message type and length.
Sorry, I cannot help you without your participation in decoding it and comparing to actual payload size.

@bradwood
Copy link
Author

bradwood commented Sep 26, 2018

�~� is 001111110111111000111111 in binary, which is: 3F7E3F in hex.

those bytes look like this end-to-end...
00111111
01111110
00111111

I am not an expert on th RFC, but what is this saying in terms of type and length?

Do you think the problem is with that? Ie, it not agreeing to the actual payload length, and the thing waiting for an extra frame, for example?

@bradwood
Copy link
Author

Interestingly, this web socket cli client, written in go, handles the output fine, including that binary stuff at the top:

ws ws://skyq:9006/as/system/status
< {
   "camessage" : {
      "reason" : "no message",
      "state" : "unavailable"
   },
   "drmstatus" : {
      "state" : "available"
   },
   "entitlements" : [
      "ANALYTICS",
      "BIGBASIC",
      "ETHAN_APP_1",
      "HD",
      "PDL",
      "SKY_DRM_CE",
      "SKY_DRM_MR",
      "SKY_IPPV",
      "ULTRA+",
      "SKY+",
      "GATEWAYENABLER",
      "SIDELOAD"
   ],
   "epginfobits" : {
      "epginfobits" : "0xFDE5FFC0",
      "mask" : "0x5FFA003F",
      "state" : "available"
   },
   "gatewayservices" : {
      "state" : "available"
   },
   "hdmi" : {
      "2160p10bitCapable" : false,
      "authenticatedHDCP" : "NONE",
      "reason" : "HDMI output port is disabled",
      "sinkHDCP" : "NONE",
      "sinkHLG" : false,
      "sinkUHD" : false,
      "state" : "unavailable",
      "uhdConfigured" : false
   },
   "network" : {
      "state" : "available"
   },
   "nssplayback" : {
      "state" : "available"
   },
   "pvr" : {
      "state" : "available"
   },
   "schedule" : {
      "lastdate" : "20181003",
      "state" : "available"
   },
   "servicelist" : {
      "state" : "available"
   },
   "smartcard" : {
      "active" : true,
      "bouquet" : "4101",
      "countryCode" : "GBR",
      "currency" : "GBP",
      "cwe" : true,
      "householdid" : "10947783",
      "paired" : true,
      "state" : "available",
      "subbouquet" : "1",
      "transactionlimit" : 65535,
      "viewingCardNumber" : "725 325 260"
   },
   "swupdate" : {
      "reason" : "IDLE",
      "state" : "unavailable"
   },
   "systemupdates" : {
      "entitlements" : 2,
      "install" : 1,
      "servicegenres" : 1,
      "smartcard" : 1
   },
   "updatetask" : {
      "reason" : "no update",
      "state" : "unavailable"
   }
}

> ^C
Interrupt
✔ [brad@bradmac:~] $```

@asvetlov
Copy link
Member

The protocol says: the first 3F byte in WS message is forbidden (reserved for future use).
Are you sure that you didn't miss something?

@bradwood
Copy link
Author

I assume you are referring to the 2d, 3rd and 4th bit in the base framing protocol. In which case yes, maybe I did get it wrong. It could be a unicode/ASCII issue. I'll take another look.

@bradwood
Copy link
Author

I've tried a few other online converters... I'm getting this now:
hex = ef bf bd 7e ef bf bd
binary = 11101111 10111111 10111101 01111110 11101111 10111111 10111101

If according the RFC 6455

5.2.  Base Framing Protocol

   This wire format for the data transfer part is described by the ABNF
   [RFC5234] given in detail in this section.  (Note that, unlike in
   other sections of this document, the ABNF in this section is
   operating on groups of bits.  The length of each group of bits is
   indicated in a comment.  When encoded on the wire, the most
   significant bit is the leftmost in the ABNF).  A high-level overview
   of the framing is given in the following figure.  In a case of
   conflict between the figure below and the ABNF specified later in
   this section, the figure is authoritative.

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-------+-+-------------+-------------------------------+
     |F|R|R|R| opcode|M| Payload len |    Extended payload length    |
     |I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
     |N|V|V|V|       |S|             |   (if payload len==126/127)   |
     | |1|2|3|       |K|             |                               |
     +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
     |     Extended payload length continued, if payload len == 127  |
     + - - - - - - - - - - - - - - - +-------------------------------+
     |                               |Masking-key, if MASK set to 1  |
     +-------------------------------+-------------------------------+
     | Masking-key (continued)       |          Payload Data         |
     +-------------------------------- - - - - - - - - - - - - - - - +
     :                     Payload Data continued ...                :
     + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
     |                     Payload Data continued ...                |
     +---------------------------------------------------------------+

   FIN:  1 bit

      Indicates that this is the final fragment in a message.  The first
      fragment MAY also be the final fragment.

   RSV1, RSV2, RSV3:  1 bit each

      MUST be 0 unless an extension is negotiated that defines meanings
      for non-zero values.  If a nonzero value is received and none of
      the negotiated extensions defines the meaning of such a nonzero
      value, the receiving endpoint MUST _Fail the WebSocket
      Connection_.

Then it looks like RSV1 and RSV2 are being set to 1, as you suggest, unless some etxension is being negotiated? I have no idea.

The number of the characters in the JSON doc is 1703

The RFC says this about payload length:

Payload length:  7 bits, 7+16 bits, or 7+64 bits

      The length of the "Payload data", in bytes: if 0-125, that is the
      payload length.  If 126, the following 2 bytes interpreted as a
      16-bit unsigned integer are the payload length.  If 127, the
      following 8 bytes interpreted as a 64-bit unsigned integer (the
      most significant bit MUST be 0) are the payload length.  Multibyte
      length quantities are expressed in network byte order.  Note that
      in all cases, the minimal number of bytes MUST be used to encode
      the length, for example, the length of a 124-byte-long string
      can't be encoded as the sequence 126, 0, 124.  The payload length
      is the length of the "Extension data" + the length of the
      "Application data".  The length of the "Extension data" may be
      zero, in which case the payload length is the length of the
      "Application data".

So, given all that, where do we go next? If the server is brain-dead and not RFC compliant, then I need to work around it as other client's clearly can do... Can you offer any suggestions?

@asvetlov
Copy link
Member

Still looks suspicious.
The only known extension is WebSocket compression which utilizes RSV1. I don't know what extension uses RSV2.
Moreover, opcode FF is reserved too and should be not used.

Unless you don't know what the server sends there is no way to "fix" a client.

@bradwood
Copy link
Author

bradwood commented Sep 26, 2018

@asvetlov I tried the websockets library and it works... So I'm going to go with that, as I cannot spend more time on debugging this library with you...

aoihttp -- doesn't work

import asyncio
import logging
import sys

import aiohttp

LOGGER = logging.getLogger(__name__)
logformat = "[%(asctime)s] %(levelname)s:%(name)s:%(message)s"
logging.basicConfig(level=logging.DEBUG, stream=sys.stdout,
                    format=logformat, datefmt="%Y-%m-%d %H:%M:%S")


async def main():
    async with aiohttp.ClientSession() as session:
        async with session.ws_connect('http://skyq:9006/as/system/status') as ws:
            payload = await ws.receive()
            print(payload.text)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

websockets -- works

import asyncio
import websockets
import logging
import sys

LOGGER = logging.getLogger(__name__)
logformat = "[%(asctime)s] %(levelname)s:%(name)s:%(message)s"
logging.basicConfig(level=logging.DEBUG, stream=sys.stdout,
                    format=logformat, datefmt="%Y-%m-%d %H:%M:%S")


async def hello():
    async with websockets.connect(
            'ws://skyq:9006/as/system/status') as websocket:

        payload = await websocket.recv()
        print(f"{payload}")

asyncio.get_event_loop().run_until_complete(hello())

Fee free to close this ticket if you wish to, but you might want to keep it open as it does appear to be a bug... Or at least a situation which does not obey Postel's law.

I appreciate your help in trying to address this.

@bradwood
Copy link
Author

Still looks suspicious.
The only known extension is WebSocket compression which utilizes RSV1. I don't know what extension uses RSV2.
Moreover, opcode FF is reserved too and should be not used.

Hi @asvetlov

Unless you don't know what the server sends there is no way to "fix" a client.

I think you meant "unless, you do know what a server sends, you cannot fix a client."

In which case, I completely disagree with this point.

You are familiar with the Robustness Principle?

"Be conservative in what you send, and liberal in what you receive."

Regardless of MUSTs and SHOULDs and MUST NOTs, in the RFC, I think you are interpreting and expecting entities to comply with the RFC completely... Clearly they do not, and many many internet servers and clients out there, of various types, have horrendous bastardised half-implementations of protocols.

A good library/protocol implementation must be forgiving of these to the extent that it can. See RFC1122. It is completely possible to write a protocol client to be forgiving of (some) protocol violations without having access to the other participant in the communication.

I feel this is not the case here, you seem to be taking the hard-line view that if a host is not 100% compliant then too bad for it, we will not talk to it. This is too bad. It's like saying: 'I only speak the Queen's English, and if you come talking to me in some weird dialect of English, I won't talk to you.'

Clearly this is not a good approach if the whole point of the library is to facilitate communication.

My suggestion, therefore, is to consider where you can relax stringent requirements of the protocol (maybe via method kwargs, that turn on the "relaxed" handling), and see if you can make things talk even when the other side is not behaving properly.

Aside from this critique, I'd like to thank you and the contributors from aio-libs for all your work for the community. 👍 🥇

@bradwood
Copy link
Author

@asvetlov Just an update on this... I figured out that my (non-compliant) server requires an uppercase Upgrade: header before it will work. Just thought I'd mention it in case the issue of Title Case headers is being considered by you guys... This is a real case of a protocol violation from a poorly implemented server which causes the websocket client not to work..

@Dreamsorcerer
Copy link
Member

Current releases send the upgrade header in that format, so unless there's a method to reproduce the original issue, there's not much we can do here.

@Dreamsorcerer Dreamsorcerer added the needs-info Issue is lacking sufficient information and will be closed if not provided label Aug 9, 2024
@github-actions github-actions bot added the Stale label Sep 9, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs-info Issue is lacking sufficient information and will be closed if not provided Stale
Projects
None yet
Development

No branches or pull requests

3 participants