Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support upgrades from Pravega Operator with BK to Operator without BK #343

Closed
pbelgundi opened this issue Mar 18, 2020 · 9 comments · Fixed by #353
Closed

Support upgrades from Pravega Operator with BK to Operator without BK #343

pbelgundi opened this issue Mar 18, 2020 · 9 comments · Fixed by #353
Assignees
Labels
area/upgrade Impacts upgrade feature in operator priority/P1 Recoverable error, functionality/performance impaired but not lost, no permanent damage

Comments

@pbelgundi
Copy link
Contributor

pbelgundi commented Mar 18, 2020

Description

Till pravega-operator version 0.4.4, Bookkeeper was part of Pravega CRD. But starting operator version 0.5.0, bookkeeper would be a pre-requisite for Pravega and not part of the Pravega CRD. Starting operator version 0.5.0, BK would be a new custom resource managed by the Bookkeeper Operator.
This issue tracks changes required to support upgrading pravega-operator from version 0.4.4 to 0.5.0.

Importance

must-have

Suggestions for an improvement

  1. Migrate Pravega CRD from current version to new version (without BK) using a conversion webhook: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definition-versioning/
  2. Migrate ownership of bookkeeper objects (spec values, config-map, STS, headless service, PDB, etc..) from Pravega Operator to Bookkeeper Operator.
@pbelgundi pbelgundi added priority/P1 Recoverable error, functionality/performance impaired but not lost, no permanent damage area/upgrade Impacts upgrade feature in operator labels Mar 18, 2020
@pbelgundi
Copy link
Contributor Author

pbelgundi commented Mar 18, 2020

Note that using a conversion webhook requires K8s server version to be ideally 1.16+ or atleast 1.15+ so that CustomResourceWebhookConversion feature is enabled.
https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definition-versioning/#webhook-conversion

@pbelgundi
Copy link
Contributor Author

pbelgundi commented Mar 19, 2020

Version upgrade for Custom Resource objects can be done in 2 ways :

  1. Using storage version migrator
  2. Manually upgrading the stored version
    See reference for details: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definition-versioning/#upgrade-existing-objects-to-a-new-stored-version

@pbelgundi
Copy link
Contributor Author

pbelgundi commented Mar 19, 2020

Current Pravega CR version (with bookkeeper) is "v1alpha1". Based on api maturity levels defined here, this can be changed to "v1beta1" now (without bookkeeper)

@pbelgundi
Copy link
Contributor Author

pbelgundi commented Mar 24, 2020

Supporting upgrade from Pravega operator version 0.4 to 0.5 will require the following changes:

  1. Add new version to the Pravega CRD and change operator code to monitor the new version (v1beta1) instead of old. When the new operator is deployed, if Pravega cluster object is at old version, it will trigger the conversion webhook for converting Pravega cluster object v1alpha1 to v1beta1.
  2. Conversion webhook should do the following :
    a. Move all Bookkeeper objects - STS, PDB, ConfigMap, Headless Service, PVCs to Bookkeeper custom resource object, monitored by Bookkeeper cluster.
    b. Update the BookkeeperUri feild in the new v1beta1 object to point to HeadlessService URLs of Bookkeeper.
  3. Remove support for Pravega cluster object v1alpha1 by updating the Pravega CRD.

@pbelgundi
Copy link
Contributor Author

@pbelgundi
Copy link
Contributor Author

Writing a conversion webhook requires using the "conversion" package in controller-runtime.
The controller-runtime library currently used by operator is at version v0.1.8 which does not have the conversion package. So need to upgrade this to latest version - v0.5.1

How to wire up the webhook:
https://book.kubebuilder.io/multiversion-tutorial/tutorial.html

@Prabhaker24
Copy link
Contributor

Prabhaker24 commented Mar 25, 2020

I have tried to do manual migration of ownership of bk from pravega operator to bookkeeper operator as part of the 1st point of the upgrade process mentioned above and have tested the following and it's working:-

  1. Segment store is able to access Bookies, while the ownership and control of bookies are with bookkeeper Operator.
  2. IPs of Bookie pods have not changed.
  3. IPs the headless service points have not changed
  4. Deleting any of the 5 objects(sts, pvc, pdb, config map, headless Service) will recreate them with the correct ownership references i.e of the bookkeeper operator.
  5. Deleting the bookkeeper cluster object will also delete the above 5 objects automatically.

@pbelgundi pbelgundi changed the title Support upgrades from Pravega 0.7 to 0.8 Support upgrades from Pravega Operator with BK to Operator without BK Mar 31, 2020
@pbelgundi
Copy link
Contributor Author

Looks like we would need to add a structural schema to the Pravega CRD to be able to support Conversion Webhooks. As per K8s documentation:

Structural schemas are a requirement for apiextensions.k8s.io/v1, and disables the following features for apiextensions.k8s.io/v1beta1:
Validation Schema Publishing
Webhook Conversion

@pbelgundi
Copy link
Contributor Author

pbelgundi commented Apr 8, 2020

The upgrade from pravega-operator version 0.4.x to 0.5.x will be a 2 step process.
1.Execute a script that would make the necessary changes for conversion webhook to be triggered post upgrade.
2. Trigger operator upgrade by updating image tag in pravega-operator deployment.

The script should do the following:

  1. Add a new version v1beta1 to Pravega CRD with storage:true, served:true. Also set storage for version v1alpha1 to false.

  2. Create Bookkeeper operator (use same namespace as Pravega-Operator, name can be anything)

  3. Create validatingwebhookconfiguration and webhook svc using:
    When script executes to completion, all these artifacts must be created successfully.

Once upgrade to operator version 0.5.x is complete, if the existing PravegaCluster CR belongs to version v1alpha1, the conversion webhook will be triggered to convert this CR into version v1beta1.
The webhook will do the following:

  1. For Bookkeeper Artifacts ( STS, ConfigMap, PDB, HeadlessService, PVCs) change ControllerReference=false for PravegaCluster.
  2. Convert the existing Pravega Custom resource object (version : v1alpha1) to the new version - v1beta1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/upgrade Impacts upgrade feature in operator priority/P1 Recoverable error, functionality/performance impaired but not lost, no permanent damage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants