Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support scaleDownDelaySeconds & fast rollbacks with canary strategy #557

Open
jessesuen opened this issue Jun 26, 2020 · 7 comments
Open
Labels
canary Canary related issue enhancement New feature or request traffic-routing

Comments

@jessesuen
Copy link
Member

jessesuen commented Jun 26, 2020

The blue-green strategy has a neat feature in that when the rollout is moving to an older ReplicaSet which is still in its scaleDownDelaySeconds, we perform what is called a "fast-tracked rollback". With a fast-tracked rollback, the rollout will skip all steps, analysis, etc... This feature allows multiple, older versions of the blue-green stack to exist and still run, allowing a rollout to quickly update to a previous stack.

However, when using the canary strategy, the only time we perform a fast-tracked rollback is if the user re-applies a manifest which is equal to the stable pod spec and the rollout has not yet completed it's upgrade. If an older pod spec is applied that is equal to previous scaled-down ReplicaSets, then the rollout will still go through it's normal cycle of steps, analysis, etc...

I think the canary strategy should have the same functionality as blue-green, in that multiple, older ReplicaSets could continue to still run fully scaled for some user-defined period of time. And if a Rollout spec is re-applied which is equal to one of the scaled ReplicaSets, we will also perform a fast-tracked rollback. Note that leaving older ReplicaSets scaled up, would only work for mesh and ingress enabled canary, and not the weighted replica count canary, because allowing older stacks to remain up with the normal canary would mean that traffic would reach the older ReplicaSets.

In order to support fast-tracked rollback for the normal, weighted replica count canary, we could annotate a deadline on the older ReplicaSets, and if we find that we are moving to that older ReplicaSet within the deadline, we would skip all steps, analysis, etc...

@jessesuen jessesuen added enhancement New feature or request canary Canary related issue traffic-routing labels Jun 26, 2020
@jessesuen jessesuen changed the title scaleDownDelay for canary strategy (with traffic routing) Fast-tracked rollbacks with canary strategy Jun 26, 2020
@jessesuen
Copy link
Member Author

jessesuen commented Jun 26, 2020

Here is a proposed syntax:

spec:
  strategy:
    canary:
      # Duration after a completed upgrade, in which a fast-tracked rollback to the older ReplicaSet will occur.
      # If omitted, it will look to the value in scaleDownDelaySeconds, and then zero.
      rollbackWindow: 24h

      # Duration in seconds that an older ReplicaSet will remain scaled up after a completed upgrade.
      # Only applicable when trafficRouting is enabled.
      scaleDownDelaySeconds: 3600
      scaleDownDelayRevisionLimit: 2
      trafficRouting:
        smi: {}

Note that the above syntax would allow for an independent control of a rollback window, and scaleDownDelay. The rollbackWindow allows a fast-tracked rollback to occur even when the older ReplicaSet has been scaled down.

In the above example, the rollout would keep two older replicaSets fully scaled 100%, each for a total of 1 hour. However if the Rollout moved to the older ReplicaSet within 24 hours, it would skip analysis and steps, but still have to take some time to scale up the older ReplicaSet.

@jessesuen jessesuen changed the title Fast-tracked rollbacks with canary strategy More controls on fast-tracked rollbacks Jul 8, 2020
@jessesuen jessesuen changed the title More controls on fast-tracked rollbacks Support scaleDownDelaySeconds & fast rollbacks with canary strategy Jul 8, 2020
@jessesuen
Copy link
Member Author

I spawned #574 from this bug to introduce "rollback windows" to enable more control over fast rollbacks for both blue-green and canary. This bug will remain open to support scaleDownDelaySeconds in the canary strategy.

@dthomson25 dthomson25 added this to the v0.10 milestone Jul 20, 2020
@jessesuen jessesuen modified the milestones: v0.10, v0.11 Nov 3, 2020
@jessesuen jessesuen modified the milestones: v1.0, v1.1 Jan 4, 2021
@jessesuen jessesuen removed this from the v1.1 milestone May 14, 2021
@lapwingcloud
Copy link

it's more than 1.5 years, any update for this ? thanks

@awx-fuyuanchu
Copy link

Praise for supporting this.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 7, 2022

This issue is stale because it has been open 60 days with no activity.

@lapwingcloud
Copy link

unstale this please bot

@nebojsa-prodana
Copy link

Hello, is this still something that the Argo project plans on supporting?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
canary Canary related issue enhancement New feature or request traffic-routing
Projects
None yet
Development

No branches or pull requests

5 participants