Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rollback fails in case of only 1 Segment Store Pod #449

Closed
Prabhaker24 opened this issue Sep 14, 2020 · 0 comments · Fixed by #450
Closed

Rollback fails in case of only 1 Segment Store Pod #449

Prabhaker24 opened this issue Sep 14, 2020 · 0 comments · Fixed by #450

Comments

@Prabhaker24
Copy link
Contributor

Prabhaker24 commented Sep 14, 2020

Description

When the upgrade of a pravega cluster containing a single segmentstore replica fails (due to upgrade to an invalid pravega version), the readiness check on the controller pods soon starts to fail.

NAME READY STATUS RESTARTS AGE
bookkeeper-bookie-0 1/1 Running 0 66m
bookkeeper-bookie-1 1/1 Running 0 66m
bookkeeper-bookie-2 1/1 Running 0 66m
bookkeeper-operator-cd5b76d74-frvp9 1/1 Running 0 8h
nfs-server-provisioner-1599135458-0 1/1 Running 0 8d
pravega-operator-5c5776fb84-bwhfb 1/1 Running 0 8h
pravega-pravega-controller-667b796659-ft85s 0/1 Running 0 64m
pravega-pravega-segment-store-0 0/1 ImagePullBackOff 0 60m
zookeeper-0 1/1 Running 0 67m
zookeeper-1 1/1 Running 0 67m
zookeeper-2 1/1 Running 0 66m
zookeeper-operator-55ccb88b87-r4l5j 1/1 Running 0 2d2h

Subsequently, even the rollback of the pravega cluster to its previous version fails.
This happens only in case there is 1 ss pod and upgrade for that ss pod has failed and as not even one ss pod is available get request for system scope fails and when rollback happens the controller pod isn't marked ready for the same reason.

Importance

should_have

Location

upgrade.go file

Suggestions for an improvement

In case of only 1 ss replica change the order of rollback, Rollback first the ss pod and then the controller pod.

@Prabhaker24 Prabhaker24 changed the title Controller readiness check starts failing on a single SSS pravegacluster after a failed upgrade Rollback fails in case of only 1 Segment Store Pod Sep 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant