-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 343: Support p-operator upgrade from version 0.4.x to 0.5.0 #353
Conversation
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
…t-version-migration
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crd is not updated in manual installation.
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts on how charts are managed...
Signed-off-by: pbelgundi <[email protected]>
@Ranganaths8 do you know if there is a given structure for PR descriptions -- especially big multi-package ones -- on the Pravega control plane repositories? Unrelated: I also notice that the Travis builds are failing (unit tests.) |
Added ChangeLog and Verification Steps. |
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
Signed-off-by: pbelgundi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For manual installation, deploy folder is not updated with changes only chart is updated
Signed-off-by: pbelgundi <[email protected]>
Added deploy files for manual deployment |
Signed-off-by: pbelgundi <[email protected]>
I had missed adding the crd folder. Added now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixed. |
Change log description
This PR adds the following changes to Pravega Operator:
v1beta1
added to Pravega CRD to represents Pravega Cluster resource without Bookkeeper.a. Converts CR object at version
v1alpha1
(CR with Bookkeeper) tov1beta1
(CR without BK).b. Creates the Bookkeeper cluster object and transfers Bookkeeper data in
v1alpha1
Pravega CR object (p.Spec.Bookkeeper) to the newly created Bookkeeper Cluster object.c. Migrate ownership of the Bookkeeper STS, ConfigMaps, PDB, Services etc from Pravega CR to Bookkeeper CR.
d. Update owner references of existing Pravega artifacts ( Controller and Segment Store STS, PDB, ConfigMap, Services etc...) to point to the new CR version -
v1beta1
instead ofv1alpha1
Purpose of the change
Fixes #343, #17, #290, #345
How to verify it
0.4.3
or0.4.4
, execute the scripttools/OperatorUpgrade.sh
to trigger the operator upgrade to0.5.0
and the following should be noticed post upgrade trigger:a. New p-operator pod starts up and logs show that conversion webhook is triggered, should see log:
"Converting Pravega CR version from v1alpha1 to v1beta1."
eventually followed by this message:
"Version migration completed successfully."
b. The Operator reconcile loop starts running and is able to set defaults on the new PravegaCluster CR object. Can be checked using log:
"
Reconciling Pravega Cluster ...
"
NOTE
The execution of conversion code on operator may take only a few seconds or upto a minute.
But for these changes to be reflected on the K8s server, it takes several minutes ( typically 8-10 minutes). During this period, even though the operator logs show that the conversion has completed, resource requests (
kubectl get
anddescribe
) on the Pravega CR will continue to fail till conversion is complete on the K8s server.Once resource requests on pravega cluster start succeeding confirm the following:
c. Pravega CR version is migrated to v1beta1. There is no "bookkeeper" field in the new version but instead a
bookkeeperUri
field.d. Cluster status for both Pravega cluster and bookkeeper cluster reflect correct values based on pods belonging to each cluster type.
e. Owner Reference for BK ConfigMap, PDB, all PVCs, STS and headless svc points to BK Cluster version
v1alpha1
f. Owner Reference for Pravega artifacts points to
v1beta1
APIVersion instead ofv1alpha1
g. Old StatefulSets for Segment Store are deleted and new ones created (with same name and values)
h. Finalizer "cleanUpZookeeper" is not present in the new PravegaCluster object (v1beta1). Zookeeper cleanup will now be handled by BK Operator.
g. BK Operator starts managing Bookkeeper artifacts STS/CM/PDB/SVC etc…scale/deletion/upgrade etc...
h. Pravega Operator still manages Pravega artifacts STS/Deployment/ConfigMaps etc for Controller and Segment Store ... check using scale /upgrade/restart of Pravega controller/sss artifacts
i. Post upgrade is complete, try deleting the PravegaCluster resource followed by BookkeeperCluster resource and deletion should happen as expected. This makes sure ownership is correctly transferred and zkfinalizer is deleted from PravegaCR.
j. The v1beta1 CR should have same values for all Spec fields as v1alpha1 CR had (prior to upgrade) except BK Spec. Tier2 name should have changed to LongTermStorage.
After p-operator has completed CR conversion, it takes several minutes for K8s to apply those changes on the server...typically 8-10 mins is what was noticed.
During this period commands
kubectl get
andkubectl describe
on pravegacluster will continue to fail and any operations like scale/delete/upgrade should not be performed as the version conversion has not taken effect on the K8s server.