Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 343: Support p-operator upgrade from version 0.4.x to 0.5.0 #353

Merged
merged 30 commits into from
Apr 23, 2020

Conversation

pbelgundi
Copy link
Contributor

@pbelgundi pbelgundi commented Apr 2, 2020

Change log description

This PR adds the following changes to Pravega Operator:

  1. Migration of controller-runtime from v0.1.8 to v0.5.1
  2. A new CR version v1beta1 added to Pravega CRD to represents Pravega Cluster resource without Bookkeeper.
  3. Removed the existing MutatingWebhook for CR validation and instead added a new validation webhook using the new controller-runtime library.
  4. Fixed the versions config map issue by mounting the version map to a volume instead of reading directly.
  5. Added a Conversion Webhook that does the following:
    a. Converts CR object at version v1alpha1 (CR with Bookkeeper) to v1beta1(CR without BK).
    b. Creates the Bookkeeper cluster object and transfers Bookkeeper data in v1alpha1 Pravega CR object (p.Spec.Bookkeeper) to the newly created Bookkeeper Cluster object.
    c. Migrate ownership of the Bookkeeper STS, ConfigMaps, PDB, Services etc from Pravega CR to Bookkeeper CR.
    d. Update owner references of existing Pravega artifacts ( Controller and Segment Store STS, PDB, ConfigMap, Services etc...) to point to the new CR version - v1beta1 instead of v1alpha1
  6. Added the new p-operator chart to be used with Operator 0.5.0.
  7. Added basic OpenApiV3Schema Validation to Pravega CRD both old and new versions of the Pravega CR.
  8. Script for triggering operator upgrade from version 0.4.x to 0.5.0
  9. Changed name of Spec field "tier2" to "longtermStorage" in version v1beta1

Purpose of the change

Fixes #343, #17, #290, #345

How to verify it

  1. In an environment running p-operator versions 0.4.3 or 0.4.4, execute the script tools/OperatorUpgrade.sh to trigger the operator upgrade to 0.5.0 and the following should be noticed post upgrade trigger:
    a. New p-operator pod starts up and logs show that conversion webhook is triggered, should see log:
    "Converting Pravega CR version from v1alpha1 to v1beta1."
    eventually followed by this message:
    "Version migration completed successfully."
    b. The Operator reconcile loop starts running and is able to set defaults on the new PravegaCluster CR object. Can be checked using log:
    "
    Reconciling Pravega Cluster ...
    "
    NOTE
    The execution of conversion code on operator may take only a few seconds or upto a minute.
    But for these changes to be reflected on the K8s server, it takes several minutes ( typically 8-10 minutes). During this period, even though the operator logs show that the conversion has completed, resource requests (kubectl get and describe) on the Pravega CR will continue to fail till conversion is complete on the K8s server.
    Once resource requests on pravega cluster start succeeding confirm the following:
    c. Pravega CR version is migrated to v1beta1. There is no "bookkeeper" field in the new version but instead a bookkeeperUri field.
    d. Cluster status for both Pravega cluster and bookkeeper cluster reflect correct values based on pods belonging to each cluster type.
    e. Owner Reference for BK ConfigMap, PDB, all PVCs, STS and headless svc points to BK Cluster version v1alpha1
    f. Owner Reference for Pravega artifacts points to v1beta1 APIVersion instead of v1alpha1
    g. Old StatefulSets for Segment Store are deleted and new ones created (with same name and values)
    h. Finalizer "cleanUpZookeeper" is not present in the new PravegaCluster object (v1beta1). Zookeeper cleanup will now be handled by BK Operator.
    g. BK Operator starts managing Bookkeeper artifacts STS/CM/PDB/SVC etc…scale/deletion/upgrade etc...
    h. Pravega Operator still manages Pravega artifacts STS/Deployment/ConfigMaps etc for Controller and Segment Store ... check using scale /upgrade/restart of Pravega controller/sss artifacts
    i. Post upgrade is complete, try deleting the PravegaCluster resource followed by BookkeeperCluster resource and deletion should happen as expected. This makes sure ownership is correctly transferred and zkfinalizer is deleted from PravegaCR.
    j. The v1beta1 CR should have same values for all Spec fields as v1alpha1 CR had (prior to upgrade) except BK Spec. Tier2 name should have changed to LongTermStorage.

After p-operator has completed CR conversion, it takes several minutes for K8s to apply those changes on the server...typically 8-10 mins is what was noticed.
During this period commands kubectl get and kubectl describe on pravegacluster will continue to fail and any operations like scale/delete/upgrade should not be performed as the version conversion has not taken effect on the K8s server.

Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Copy link
Contributor

@anishakj anishakj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crd is not updated in manual installation.

Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Copy link
Contributor

@bourgeoisor bourgeoisor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts on how charts are managed...

charts/0.5.0/pravega-operator/Chart.yaml Outdated Show resolved Hide resolved
charts/0.5.0/pravega-operator/Chart.yaml Outdated Show resolved Hide resolved
tools/charts/bookkeeper-operator/Chart.yaml Outdated Show resolved Hide resolved
@bourgeoisor
Copy link
Contributor

bourgeoisor commented Apr 18, 2020

@Ranganaths8 do you know if there is a given structure for PR descriptions -- especially big multi-package ones -- on the Pravega control plane repositories?

Unrelated: I also notice that the Travis builds are failing (unit tests.)

@pbelgundi
Copy link
Contributor Author

Is this PR still a draft? I notice that the PR description is missing;
No comments, no testing evidence, no overviews, no description on how to verify it...

@Ranganaths8 do you know if there is a given structure for PR descriptions -- especially big multi-package ones -- on the Pravega control plane repositories?

Unrelated: I also notice that the Travis builds are failing (unit tests.)

Added ChangeLog and Verification Steps.

@pbelgundi pbelgundi changed the title Issue 343: Support p-operator upgrades Issue 343: Support p-operator upgrade from version 0.4.x to 0.5.0 Apr 20, 2020
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Copy link
Contributor

@anishakj anishakj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For manual installation, deploy folder is not updated with changes only chart is updated

Signed-off-by: pbelgundi <[email protected]>
@pbelgundi
Copy link
Contributor Author

For manual installation, deploy folder is not updated with changes only chart is updated

Added deploy files for manual deployment

Signed-off-by: pbelgundi <[email protected]>
@pbelgundi
Copy link
Contributor Author

For manual installation, deploy folder is not updated with changes only chart is updated

Added deploy files for manual deployment

I had missed adding the crd folder. Added now.

Copy link
Contributor

@anishakj anishakj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pbelgundi
Copy link
Contributor Author

Some thoughts on how charts are managed...

Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support upgrades from Pravega Operator with BK to Operator without BK
3 participants