Bookkeeper cluster should be managed by Pravega Operator #504

pbelgundi · 2021-02-05T05:19:03Z

Description

To be able to dynamically scale Pravega, it is crucial that pravega-operator manage both Segment Store and Bookkeeper clusters, as we would typically want to scale both up and down together.

Importance

must-have

Location

pravega-operator

Suggestions for an improvement

Change Pravega CustomResourceDefinition to include Bookkeeper.
If upgrades need to be supported, then this would involve CRD conversion from v1beta1 to the next version.

RaulGracia · 2021-02-05T09:12:15Z

@pbelgundi while I understand the need to scale both Pravega and Bookkeeper based on the workload, I'm unsure that coupling the Pravega Operator to Bookkeeper again is the way to go. As you know, the "ancient" Pravega Operator was already managing both the Bookkeeper and Pravega services. But eventually, the decision was to split them into 2 separate operators, which makes sense for many reasons. Based on the description posted, we could at least distinguish between 2 types of scaling:

Static scaling: We have an on-premise cluster and we added few servers, so we want to expand the Pravega and Bookkeeper services to utilize the new resources. This simple case can be done manually by updating the number of replicas in the appropriate services, based on the available resources. This looks as a one-time or very rare event, and seems more usual in "on-premise clusters" (i.e., no need for dynamic autoscaling).
Dynamic scaling: Specially in "cloud-based scenarios", we may want to dynamically scale both the Pravega and Bookkeeper cluster to the incoming workload, as there is an economic incentive to do so. I assume that this is the closest scenario to what is described in the issue. However, I wonder if there are other ways to materialize this idea. For instance, having a separate service (e.g., pravega-autoscaler) could achieve this objective without adding more complexity to the existing Pravega Operator. Other reasons that can be easily seen from this approach are the following: i) The Pravega Operator would be the same for scenarios where auto-scaling is needed and scenarios in which it is not; ii) We would keep Pravega Operator and Bookkeeper Operator strictly focused on their targeted services; iii) The "auto-scaling" function will likely require to build a "feedback loop", which would lead to something like consuming metrics of the workload/resource utilization, implement some "cluster scaling policies", evaluate them continuously, and then react by scaling the cluster up and down, if necessary. As you can see, all this functionality may get complex enough to deserve its own software component running as a microservice. iv) If in the future there are other alternatives to Bookkeeper as Tier 1, the Pravega Operator would be agnostic to it, as it would be the pravega-autoscaler the place where other Tier 1 options may get plugged in.

Perhaps, tools like Kubernetes horizontal autoscaler may help us up to a great extent to achieve this objective in Kubernetes-based scenarios.

Another approach could be to implement the "autoscaling functionality" on each operator separately; that is, the Bookkeeper Operator could autoscale Bookkeeper, and the Pravega Operator could autoscale Pravega. One the one hand, this could make sense, given that these system may require to be scaled for very different reasons (Bookkeeper is often IO bound, whereas Pravega is quite often CPU bound in our performance experiments). On the other hand, this would lead to kind of "repeat" the same functionality in both Operators (which can be mitigated by sharing a common autoscale logic across both operators).

Anyway, irrespective of the possible approach to achieve dynamic scaling, my main concern is to again couple Pravega Operator with functionality related to Bookkeeper, which I think that needs to be treated separately.

pbelgundi added area/upgrade Impacts upgrade feature in operator area/API Issue related to the custom resource API and removed area/upgrade Impacts upgrade feature in operator labels Feb 5, 2021

pbelgundi added the kind/enhancement Enhancement of an existing feature label Feb 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bookkeeper cluster should be managed by Pravega Operator #504

Bookkeeper cluster should be managed by Pravega Operator #504

pbelgundi commented Feb 5, 2021

RaulGracia commented Feb 5, 2021

Bookkeeper cluster should be managed by Pravega Operator #504

Bookkeeper cluster should be managed by Pravega Operator #504

Comments

pbelgundi commented Feb 5, 2021

Description

Importance

Location

Suggestions for an improvement

RaulGracia commented Feb 5, 2021