Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 312 making ss cache volumes optional #342

Merged
merged 16 commits into from
Mar 20, 2020

Conversation

Prabhaker24
Copy link
Contributor

Signed-off-by: prabhaker24 [email protected]

Change log description

From Pravega 0.7, the Segment Store cache is completely in-memory. This means that the PVC that the operator creates is not necessary anymore. We need to make the deployment of this volume optional, depending on whether the version of Pravega is older or newer than 0.7, thus these changes will deploy as well as upgrade the version accordingly so as to not include pvc's incase we are deploying or upgrading to a Pravega version 0.7 or above and remove the old ones deployed by a version below 0.7 in case of an upgrade.
In Case of a rollback when an upgrade from a version above 0.7 to a version below 0.7 fails, it should remove the pods of the segment store above version 0.7 and should deploy all segment store pods of version below 0.7 along with their pvc's

Purpose of the change

Fixes #312

What the code does

These changes will deploy the segment store with the pvc's for a version below 0.7 and will deploy the segment store without pvc's for a version 0.7 or above, it will also take care of upgrades in case the upgrades are from a version below 0.7 to a version above or equal to 0.7 and in this case it will remove the pvc's created by the older version while deploying the segment store for the version above 0.7.
It will also handle the rollback in case of an upgrade failure from a version below 0.7 to version above 0.7 and will display appropriate upgrade failed message and when the user changes the version to the last version(below 0.7 from which he was upgrading) it will rollback it to this below 0.7 version and remove the old pods of the segment store having version above 0.7.
It will also display appropriate user message incase the above rollback also fails.

How to verify it

  1. Deploy Pravega with a version 0.7 or above and it should be deployed without any pvc's
  2. Upgrade Pravega with a version blow 0.7 to a version above 0.7 it should do two this deploy the new segment store without pvc's and remove the old segment store as well as the pvc's attached to it.
  3. Upgrade Pravega from a version below 0.7 to a version above 0.7
  4. Upgrade Pravega from a version above 0.7 to a version above 0.7
  5. Upgrade from 0.6.0 to 0.6.2 .
  6. I/o should be working in case of all the above four processes both while the upgrade or deployment is in the process as well as when it's completed to check that Benchmark can be run.
  7. Rollback in case the upgrade fails while upgrading from a version below 0.7 to a version below 0.7
  8. Rollback in case the upgrade fails while upgrading from a version below 0.7 to a version above 0.7
  9. Rollback in case the upgrade fails while upgrading from a version above 0.7 to a version above 0.7
  10. Displays appropriate message in case rollback fails while doing a rollback if an upgrade fails from a version below 0.7 to a version above 0.7.

@Prabhaker24 Prabhaker24 requested a review from pbelgundi March 16, 2020 11:41
Signed-off-by: prabhaker24 <[email protected]>
@Prabhaker24 Prabhaker24 marked this pull request as ready for review March 16, 2020 15:07
Signed-off-by: prabhaker24 <[email protected]>
pkg/controller/pravega/pravega_segmentstore.go Outdated Show resolved Hide resolved
pkg/controller/pravegacluster/pravegacluster_controller.go Outdated Show resolved Hide resolved
err = r.syncSegmentStoreSize(p)
if err != nil {
return err
/*this condition is to stop syncSegmentstore version from running when we are updapting the ss from version below 07
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just say in the comment " We skip calling syncSegmentStoreSize() during upgrade/rollback from version 07"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

err = r.syncStatefulSetPvc(sts)
if err != nil {
return fmt.Errorf("failed to sync pvcs of stateful-set (%s): %v", sts.Name, err)
// this check is to avoid calling syncStatefulSetPvc() over ss with version above 07
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

stsAbove07 := &appsv1.StatefulSet{}
name := util.StatefulSetNameForSegmentstoreAbove07(p.Name)
err := r.client.Get(context.TODO(), types.NamespacedName{Name: name, Namespace: p.Namespace}, stsAbove07)
if err != nil && errors.IsNotFound(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if client.Get() returns error, but it is not a "NotFound" error? Please add a check similar to :
https://github.com/pravega/pravega-operator/blob/master/pkg/controller/pravegacluster/pravegacluster_controller.go#L92

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All errors must be logged with log level "Error". We're not logging anything here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have changed it

if (r.IsClusterRollbackingFrom07(p) && newsts.Status.ReadyReplicas == *newsts.Spec.Replicas) || (oldsts.Status.ReadyReplicas+newsts.Status.ReadyReplicas == p.Spec.Pravega.SegmentStoreReplicas && newsts.Status.ReadyReplicas == *newsts.Spec.Replicas) {
//this check is run till the value of old sts replicas is greater than 0 and will increase two replicas of the new sts and delete 2 replicas of the old sts
if *oldsts.Spec.Replicas > 2 {
*newsts.Spec.Replicas = *newsts.Spec.Replicas + 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encapsulate the changes to newSTS in a function and invoke that here.
Ditto for old STS, so its easier to follow what is happening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
} else {
//here we remove the pvc's attached with the old sts and deleted it when old sts replicas have become 0
*newsts.Spec.Replicas = p.Spec.Pravega.SegmentStoreReplicas
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again encapsulate changes to old STS in a method and invoke the same....
One more thing, we need to increment the newSTS first and then do stuff for deleting the old sts, so that we err on side of caution.... and can have more redundancy than less....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if ver == "" {
return true
}
first3 := strings.Trim(ver, "\t \n")[0:3]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use CompareVersions here? Lets try to reuse existing methods wherever possible, Its is not good to use only way of doing things here and another way in the webhook.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it

return false
}

//this function will return true only in case of upgrading from a version below 0.7 to a version above 0.7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the comment, "to pravega version 0.7 or later"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

pkg/util/pravegacluster.go Outdated Show resolved Hide resolved
prabhaker24 added 3 commits March 18, 2020 17:02
Signed-off-by: prabhaker24 <[email protected]>
Signed-off-by: prabhaker24 <[email protected]>
Copy link
Contributor

@pbelgundi pbelgundi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address all review comments. I still see some earlier comments open and unaddressed.

pkg/controller/pravegacluster/upgrade.go Show resolved Hide resolved
}

//TO detect upgade/rolback faiure
if oldsts.Status.ReadyReplicas+newsts.Status.ReadyReplicas != p.Spec.Pravega.SegmentStoreReplicas {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!= check should be changed to < as discussed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have changed it

}

//checking if sts below07 exsists
stsBelow07 := &appsv1.StatefulSet{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we dealing with situations where one STS is deleted and other is present?

Copy link
Contributor Author

@Prabhaker24 Prabhaker24 Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that I have dealt with as one the above07 sts is there and below07 sts is not there I am checking the version, so in case the below07sts is not there and the version is below 07, in that case, I am calling the syncSegmentStoreVersionTo07() which will remove the above07 pods and add the below07 sts

in case the below07sts is there and above07 sts is not there I am returning false and syncSegmentStoreVersion() will get called which will handle this situation

}

//this function will increase two replicas of the new sts and delete 2 replicas of the old sts everytime it's called
func (r *ReconcilePravegaCluster) incrementWhenReplicasMoreThan2(p *pravegav1alpha1.PravegaCluster, newsts *appsv1.StatefulSet, oldsts *appsv1.StatefulSet) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we call this method scaleSegmentStoreSTS()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it

}

//This function will remove the pvc's attached with the old sts and deleted it when old sts replicas have become 0
func (r *ReconcilePravegaCluster) incrementWhenReplicasLessThan2(p *pravegav1alpha1.PravegaCluster, newsts *appsv1.StatefulSet, oldsts *appsv1.StatefulSet) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a better method name that is indicative of what the method does..... Perhaps transitionToNewSTS() ???

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it

pkg/util/pravegacluster.go Outdated Show resolved Hide resolved
Copy link
Contributor

@pbelgundi pbelgundi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the earlier comments are not addressed still and I have provided new comments.

if err != nil {
return fmt.Errorf("failed to sync pvcs of stateful-set (%s): %v", sts.Name, err)
/*We skip calling syncStatefulSetPvc() during upgrade/rollback from version 07*/
if !util.IsClusterUpgradingTo07(p) && !r.isRollbackTriggered(p) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. Why can't we use "IsClusterRollbackingFrom07()"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it

if err != nil {
return err
/*We skip calling syncSegmentStoreSize() during upgrade/rollback from version 07*/
if !util.IsClusterUpgradingTo07(p) && !r.isRollbackTriggered(p) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be checking if rollback from 07 is triggered. Here we're checking if any rollback is triggered. Why can't we use "IsClusterRollbackingFrom07()"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it

pkg/controller/pravegacluster/upgrade.go Show resolved Hide resolved
return true
}

//To handle upgrade/rollback from Pravega version < 0.7 to Pravega Version >= 0.7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong comment on method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

if util.IsClusterUpgradingTo07(p) || r.IsClusterRollbackingFrom07(p) {
return r.syncSegmentStoreVersionTo07(p)
}
//for all other cases of upgrades and rollback this function is called
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment not removed yet.

controllerutil.SetControllerReference(p, newsts, r.scheme)
err = r.client.Get(context.TODO(), types.NamespacedName{Name: newsts.Name, Namespace: p.Namespace}, newsts)
//this check is to see if the newsts is present or not if it's not present it will be created here
if err != nil && errors.IsNotFound(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don;t see the error being logged????

prabhaker24 and others added 2 commits March 19, 2020 13:33
return false, err
}
}
if !errors.IsNotFound(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if is not needed, please remove.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

prabhaker24 added 2 commits March 19, 2020 18:36
Signed-off-by: prabhaker24 <[email protected]>
…into issue-312-making-ss-cache-volume-optional
Copy link
Contributor

@pbelgundi pbelgundi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


//if version is above or equals to 0.7 this name will be assigned
func StatefulSetNameForSegmentstoreAbove07(name string) string {
return fmt.Sprintf("%s-pravega-segment-store", name)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change just broke the upgrade.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.7.0 pravega clusters historically were deployed with 0.4.x p-operator and had the stateful sets named using the pravega-segmentstore substring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make Segment Store cache volume optional for Pravega +0.7
3 participants