Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health update from Nodes on Startup/Shutdown (CI/CD Deployment, Zero downtime) #190

Closed
MichaelPeter opened this issue May 22, 2020 · 6 comments
Labels
Deployment cookbook Base capability is there, but documentation on how to achieve the scenario is required.
Milestone

Comments

@MichaelPeter
Copy link

MichaelPeter commented May 22, 2020

Hello, I am developing a greenfield project and I am new to reverse proxies so excuse me if I didn't see a feature or if there are better ways to solve this with Reverse Proxies.

We have an On-Premise application and we'd like to keep it Zero Downtime especially when updating the nodes.

So when our TFS 2018 runs its CI deployment it would work like this:
Shutdown Node 1
Update Node 1 files
Startup Node 1
Health check Node 1
Shutdown Node 2
Update Node 2 files
Startup Node 2

Same with Nodes 3-N... Maybe even parallelized update.
So there is always at least one active node.

Now when shutting down we could let the Reverse Proxy run into a timeout for Node 1, but I think it would be preferable when the Node 1 service is shut down it tells the reverse proxy it is not available anymore. Same when Node 1 starts up it tells the reverse proxy it is available again.
At the same time when the reverse proxy would need restart it checks all nodes for their health.

In this scenario it would be required if a new node is added/removed they inform the reverse proxy. In a configuration the reverse proxy would need an restart when a node is added or removed.

I did not see any option to configure this yet, is there a solution for that? Or a buzzword?

@MichaelPeter MichaelPeter added the Type: Idea This issue is a high-level idea for discussion. label May 22, 2020
@Tratcher
Copy link
Member

The scenario makes sense, though direct communication between the nodes and the proxy requires fairly tight integration. Many apps don't have a direct line of communication to their proxy.

An alternative would be for this procedure to be managed by a central orchestrator. The orchestrator in this case is the one doing the deployments and telling the nodes to shut down. It could remove those nodes from the proxy config prior to shutting them down. When the deployment is complete it could re-add the nodes to the config.

It may also be wise to use two pools of nodes in this scenario to separate the versions of the software running.

  1. Remove some nodes from pool1
  2. Shut down those nodes
  3. Upgrade and restart them
  4. Add those nodes to pool2
  5. Gradually transition a percentage of traffic from pool1 to pool2 and check for errors.
    Repeat until all nodes have been upgraded and moved.

How you add and remove nodes would depend on how you manage your configuration. The other mechanism we'd need to work on to enable this is the percentage based routing. #126 would cover that.

@samsp-msft samsp-msft added Deployment cookbook Base capability is there, but documentation on how to achieve the scenario is required. and removed Type: Idea This issue is a high-level idea for discussion. labels May 22, 2020
@samsp-msft samsp-msft added this to the 1.0.0 milestone May 22, 2020
@samsp-msft
Copy link
Contributor

We think the extensibility for being able to modify the config on the fly should cover this scenario, or feeding into the health state for the backend. This needs a write up for the different mechanisms that could be used in this case.

@AlwaysHC
Copy link

Please add the possibility to enable or disable backends by code. Then it will be easy to write code to integrate YARP with CI/CD systems

@MichaelPeter
Copy link
Author

Yes beeing able to change the routes during runtime by code would solve my problem :)

@Tratcher
Copy link
Member

Reloadable code-based config providers are covered here: https://microsoft.github.io/reverse-proxy/articles/configproviders.html.

@karelz
Copy link
Member

karelz commented Mar 24, 2021

Triage: This is more part of Orchestration. We do not think the code or docs belong to YARP itself. We can help guide of course -- for example, writing advanced ActiveHealthChecks.

@karelz karelz closed this as completed Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deployment cookbook Base capability is there, but documentation on how to achieve the scenario is required.
Projects
None yet
Development

No branches or pull requests

5 participants