-
Notifications
You must be signed in to change notification settings - Fork 335
Pubsub: health check races with get_message to read from the socket #1217
Comments
cc @Andrew-Chen-Wang since you worked #1207. |
I would have created two sockets, but I understand that isn't an option in terms of scaling. One solution might be to implement a lock on the socket when doing single command execution such as SUBSCRIBE which will block out the get_message() from trying to receive a message. This would be a PubSub specific lock to prevent this failure mode. Because data is returned in single line fashion, that could mean the subscribe call could get unrelated data though. That would mean we'd need to store all responses not related to the locking single execution command to stream out to get_message after we do get a response for subscribe. It's not ideal, but that's all I've got. Situation gets convoluted if, in the get_message poller, you do more commands like GET or SET. Then we're on a backlog until all single execution commands have been completed. It'd be a nasty queue implementation... I'll play around with it today. |
@bmerry After thinking about it for awhile, I think this is the only expected behavior as shown in async def _wait_for_data(self, func_name):
"""Wait until feed_data() or feed_eof() is called.
If stream was paused, automatically resume it.
"""
# StreamReader uses a future to link the protocol feed_data() method
# to a read coroutine. Running two read coroutines at the same time
# would have an unexpected behaviour. It would not possible to know
# which coroutine would get the next data.
if self._waiter is not None:
raise RuntimeError(
f'{func_name}() called while another coroutine is '
f'already waiting for incoming data')
assert not self._eof, '_wait_for_data after EOF'
# Waiting for data while paused will make deadlock, so prevent it.
# This is essential for readexactly(n) for case when n > self._limit.
if self._paused:
self._paused = False
self._transport.resume_reading()
self._waiter = self._loop.create_future()
try:
await self._waiter
finally:
self._waiter = None Specifically:
Again, we want to implement a lock feature such that we can somehow get an ordered stream of response data, but the queue for this would be interesting (I'm imagining a queue/ordered dict with elements/keys' type be Coroutine where the queue can pair response data with the data structure). |
It may be worth following the approach in redis/redis-py#1737, which looks promising (basically, |
That sounds good to me 👍 |
Describe the bug
This is somewhat related to #1206 (both have to do with pubsub and health checks) but a different failure mode. I think this is actually the aioredis equivalent to this redis-py bug that I linked from #1206, and possibly the same approach used in its corresponding PR will work (I haven't had a chance to review the PR).
When issuing a subscribe command on a PubSub for which there are currently no subscriptions and the connection hasn't been used for a while (specifically, the health check interval), the underlying connection will issue a PING to check the health, and try to read the PONG. However, another async task may be blocked in
get_message
, also trying to read from the socket. This leads to an exception.To Reproduce
#1207
(which fixes#1206
), or master.Expected behavior
The script should run without errors.
Logs/tracebacks
Python Version
aioredis Version
a708bd14b1a8bec0a1f3d469bf5384eb2726b5fa
Additional context
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: