Rollback when upgrade process fails #153
Labels
area/controller
Issue related to the operator controller
area/upgrade
Impacts upgrade feature in operator
kind/feature
New feature
priority/P1
Recoverable error, functionality/performance impaired but not lost, no permanent damage
status/blocked
Issue or PR is blocked on another item; add reference in a comment
version 0.4.0
Issue with Operator 0.4.0
When the
version
field is updated on thePravegaCluster
resource, the operator triggers a rolling upgrade process that updates BookKeeper, Segmentstore, and Controller pods until all of them are using the desired version.During the upgrade process, various errors can happen, from a version that does not exists to a failure in the configuration that prevents the pod from becoming ready.
At this moment, when a failure occurs, the upgrade process is halted and requires manual intervention to restore the cluster's health.
The operator should attempt to automatically restore the cluster health, rolling back to the original version. If the rollback fails, the process should halt and an alert should be sent.
The text was updated successfully, but these errors were encountered: