You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The "classic" implementation selects shards like this:
def consistent_hash(self, value):
"""
Maps the value to a node value between 0 and 4095
using CRC, then down to one of the ring nodes.
"""
if isinstance(value, str):
value = value.encode("utf8")
bigval = binascii.crc32(value) & 0xFFF
ring_divisor = 4096 / float(self.ring_size)
return int(bigval / ring_divisor)
The PUBSUB implementation does this:
def _get_shard(self, channel_or_group_name):
"""
Return the shard that is used exclusively for this channel or group.
"""
if len(self._shards) == 1:
# Avoid the overhead of hashing and modulo when it is unnecessary.
return self._shards[0]
shard_index = abs(hash(channel_or_group_name)) % len(self._shards)
return self._shards[shard_index]
I was surprised that there's a difference when I first looked at this, but now this started to really bite us in production after we introduced sharding: hash() is not guaranteed to return the same thing across multiple invocations of python, even on the same machine, unless PYTHONHASHSEED is set manually, which probably no-one ever does. See https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED
I think we should just use the same hash function across implementations. I'll send a PR :)
The text was updated successfully, but these errors were encountered:
raphaelm
added a commit
to raphaelm/channels_redis
that referenced
this issue
Sep 1, 2021
The "classic" implementation selects shards like this:
The PUBSUB implementation does this:
I was surprised that there's a difference when I first looked at this, but now this started to really bite us in production after we introduced sharding:
hash()
is not guaranteed to return the same thing across multiple invocations ofpython
, even on the same machine, unlessPYTHONHASHSEED
is set manually, which probably no-one ever does. See https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEEDI think we should just use the same hash function across implementations. I'll send a PR :)
The text was updated successfully, but these errors were encountered: