Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bookkeeper cluster should be managed by Pravega Operator #504

Open
pbelgundi opened this issue Feb 5, 2021 · 1 comment
Open

Bookkeeper cluster should be managed by Pravega Operator #504

pbelgundi opened this issue Feb 5, 2021 · 1 comment
Labels
area/API Issue related to the custom resource API kind/enhancement Enhancement of an existing feature

Comments

@pbelgundi
Copy link
Contributor

Description

To be able to dynamically scale Pravega, it is crucial that pravega-operator manage both Segment Store and Bookkeeper clusters, as we would typically want to scale both up and down together.

Importance

must-have

Location

pravega-operator

Suggestions for an improvement

Change Pravega CustomResourceDefinition to include Bookkeeper.
If upgrades need to be supported, then this would involve CRD conversion from v1beta1 to the next version.

@pbelgundi pbelgundi added area/upgrade Impacts upgrade feature in operator area/API Issue related to the custom resource API and removed area/upgrade Impacts upgrade feature in operator labels Feb 5, 2021
@RaulGracia
Copy link

@pbelgundi while I understand the need to scale both Pravega and Bookkeeper based on the workload, I'm unsure that coupling the Pravega Operator to Bookkeeper again is the way to go. As you know, the "ancient" Pravega Operator was already managing both the Bookkeeper and Pravega services. But eventually, the decision was to split them into 2 separate operators, which makes sense for many reasons. Based on the description posted, we could at least distinguish between 2 types of scaling:

  • Static scaling: We have an on-premise cluster and we added few servers, so we want to expand the Pravega and Bookkeeper services to utilize the new resources. This simple case can be done manually by updating the number of replicas in the appropriate services, based on the available resources. This looks as a one-time or very rare event, and seems more usual in "on-premise clusters" (i.e., no need for dynamic autoscaling).
  • Dynamic scaling: Specially in "cloud-based scenarios", we may want to dynamically scale both the Pravega and Bookkeeper cluster to the incoming workload, as there is an economic incentive to do so. I assume that this is the closest scenario to what is described in the issue. However, I wonder if there are other ways to materialize this idea. For instance, having a separate service (e.g., pravega-autoscaler) could achieve this objective without adding more complexity to the existing Pravega Operator. Other reasons that can be easily seen from this approach are the following: i) The Pravega Operator would be the same for scenarios where auto-scaling is needed and scenarios in which it is not; ii) We would keep Pravega Operator and Bookkeeper Operator strictly focused on their targeted services; iii) The "auto-scaling" function will likely require to build a "feedback loop", which would lead to something like consuming metrics of the workload/resource utilization, implement some "cluster scaling policies", evaluate them continuously, and then react by scaling the cluster up and down, if necessary. As you can see, all this functionality may get complex enough to deserve its own software component running as a microservice. iv) If in the future there are other alternatives to Bookkeeper as Tier 1, the Pravega Operator would be agnostic to it, as it would be the pravega-autoscaler the place where other Tier 1 options may get plugged in.

Perhaps, tools like Kubernetes horizontal autoscaler may help us up to a great extent to achieve this objective in Kubernetes-based scenarios.

Another approach could be to implement the "autoscaling functionality" on each operator separately; that is, the Bookkeeper Operator could autoscale Bookkeeper, and the Pravega Operator could autoscale Pravega. One the one hand, this could make sense, given that these system may require to be scaled for very different reasons (Bookkeeper is often IO bound, whereas Pravega is quite often CPU bound in our performance experiments). On the other hand, this would lead to kind of "repeat" the same functionality in both Operators (which can be mitigated by sharing a common autoscale logic across both operators).

Anyway, irrespective of the possible approach to achieve dynamic scaling, my main concern is to again couple Pravega Operator with functionality related to Bookkeeper, which I think that needs to be treated separately.

@pbelgundi pbelgundi added the kind/enhancement Enhancement of an existing feature label Feb 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/API Issue related to the custom resource API kind/enhancement Enhancement of an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants