Health update from Nodes on Startup/Shutdown (CI/CD Deployment, Zero downtime) #190

MichaelPeter · 2020-05-22T12:04:09Z

Hello, I am developing a greenfield project and I am new to reverse proxies so excuse me if I didn't see a feature or if there are better ways to solve this with Reverse Proxies.

We have an On-Premise application and we'd like to keep it Zero Downtime especially when updating the nodes.

So when our TFS 2018 runs its CI deployment it would work like this:
Shutdown Node 1
Update Node 1 files
Startup Node 1
Health check Node 1
Shutdown Node 2
Update Node 2 files
Startup Node 2

Same with Nodes 3-N... Maybe even parallelized update.
So there is always at least one active node.

Now when shutting down we could let the Reverse Proxy run into a timeout for Node 1, but I think it would be preferable when the Node 1 service is shut down it tells the reverse proxy it is not available anymore. Same when Node 1 starts up it tells the reverse proxy it is available again.
At the same time when the reverse proxy would need restart it checks all nodes for their health.

In this scenario it would be required if a new node is added/removed they inform the reverse proxy. In a configuration the reverse proxy would need an restart when a node is added or removed.

I did not see any option to configure this yet, is there a solution for that? Or a buzzword?

Tratcher · 2020-05-22T16:49:58Z

The scenario makes sense, though direct communication between the nodes and the proxy requires fairly tight integration. Many apps don't have a direct line of communication to their proxy.

An alternative would be for this procedure to be managed by a central orchestrator. The orchestrator in this case is the one doing the deployments and telling the nodes to shut down. It could remove those nodes from the proxy config prior to shutting them down. When the deployment is complete it could re-add the nodes to the config.

It may also be wise to use two pools of nodes in this scenario to separate the versions of the software running.

Remove some nodes from pool1
Shut down those nodes
Upgrade and restart them
Add those nodes to pool2
Gradually transition a percentage of traffic from pool1 to pool2 and check for errors.
Repeat until all nodes have been upgraded and moved.

How you add and remove nodes would depend on how you manage your configuration. The other mechanism we'd need to work on to enable this is the percentage based routing. #126 would cover that.

samsp-msft · 2020-05-22T20:27:50Z

We think the extensibility for being able to modify the config on the fly should cover this scenario, or feeding into the health state for the backend. This needs a write up for the different mechanisms that could be used in this case.

AlwaysHC · 2020-05-24T09:36:29Z

Please add the possibility to enable or disable backends by code. Then it will be easy to write code to integrate YARP with CI/CD systems

MichaelPeter · 2020-05-24T18:00:23Z

Yes beeing able to change the routes during runtime by code would solve my problem :)

Tratcher · 2021-02-23T18:23:27Z

Reloadable code-based config providers are covered here: https://microsoft.github.io/reverse-proxy/articles/configproviders.html.

karelz · 2021-03-24T19:56:19Z

Triage: This is more part of Orchestration. We do not think the code or docs belong to YARP itself. We can help guide of course -- for example, writing advanced ActiveHealthChecks.

MichaelPeter added the Type: Idea This issue is a high-level idea for discussion. label May 22, 2020

Tratcher mentioned this issue May 22, 2020

Proxy can divert routes to support A/B testing, service rollout etc #126

Open

samsp-msft added Deployment cookbook Base capability is there, but documentation on how to achieve the scenario is required. and removed Type: Idea This issue is a high-level idea for discussion. labels May 22, 2020

samsp-msft added this to the 1.0.0 milestone May 22, 2020

karelz closed this as completed Mar 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Health update from Nodes on Startup/Shutdown (CI/CD Deployment, Zero downtime) #190

Health update from Nodes on Startup/Shutdown (CI/CD Deployment, Zero downtime) #190

MichaelPeter commented May 22, 2020 •

edited

Loading

Tratcher commented May 22, 2020

samsp-msft commented May 22, 2020

AlwaysHC commented May 24, 2020

MichaelPeter commented May 24, 2020

Tratcher commented Feb 23, 2021

karelz commented Mar 24, 2021

Health update from Nodes on Startup/Shutdown (CI/CD Deployment, Zero downtime) #190

Health update from Nodes on Startup/Shutdown (CI/CD Deployment, Zero downtime) #190

Comments

MichaelPeter commented May 22, 2020 • edited Loading

Tratcher commented May 22, 2020

samsp-msft commented May 22, 2020

AlwaysHC commented May 24, 2020

MichaelPeter commented May 24, 2020

Tratcher commented Feb 23, 2021

karelz commented Mar 24, 2021

MichaelPeter commented May 22, 2020 •

edited

Loading