Enable increasing the number of containers #1644
Labels
area/controller
area/segmentstore
kind/enhancement
Improvements that should be made
priority/P2
status/needs-attention
status/review-design
Problem description
Right now we are mapping segments to containers by simple consistent hashing. We should modify this so that we can increase the number of containers.
Problem location
Controller, SegmentStore. Possibly Client.
Suggestions for an improvement
We can design a system where we allow both increasing and decreasing the number of containers. However, handling the decreasing part will be somewhat more complicated than increasing, so we can tackle that later.
Controller:
ContainerCounts
; each such table will have anEpoch
which is incremented with every change, and aCount
which defines how many containers are in this epoch. For example: {Epoch=1, Count=4}, {Epoch=2, Count=8}, etc.Segment Store
Client
How will this work
When we get a Container Expansion, the Controller will instantiate those new containers, then create a new entry in the ContainerCount metadata (the exact number of steps here may be higher due to other constraints). Every new segment that is created post this will use the latest ContainerCount epoch and container count and will be evenly distributed across the cluster.
Existing segments will still be assigned to whatever containers they were assigned and will continue processing there for the remainder of their lifetimes. Eventually they will be sealed off and their successors will be redistributed using the latest ContainerCount that is available, so given enough time we will achieve a full cluster rebalancing (write-wise).
As for reads, those segments will still be mapped to their old containers, so it is possible that we may get some unbalanced reads, however this may not pose such a big problem as reads are mostly Tier 2 and cache intensive so they do not require as many resources.
Additionally we could think of a scheme where older segments will be eventually migrated to the latest epoch. Such a scheme will involve work from the Segment Store (as we will need to move metadata from the previous container to the new one). However we may not even need to do any of this, given that retention will take care of cleaning old segments so they will not pose a problem anymore.
The text was updated successfully, but these errors were encountered: