Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy can divert routes to support A/B testing, service rollout etc #126

Open
Tratcher opened this issue Apr 28, 2020 · 15 comments
Open

Proxy can divert routes to support A/B testing, service rollout etc #126

Tratcher opened this issue Apr 28, 2020 · 15 comments
Labels
Priority:1 Used for divisional .NET planning Type: Idea This issue is a high-level idea for discussion. User Story Used for divisional .NET planning
Milestone

Comments

@Tratcher
Copy link
Member

A/B testing is a common scenario for proxies, letting services incrementally upgrade/experiment using live traffic.

How does a service decide which requests to route to which test group?

Initial theory:

  • This is a separate feature from load balancing. Within a load balancing set we assume all servers are equally able to handle a request aside from considerations of health and load. You'd select a test group and then load balance within that group.
  • This should take advantage of the routing layer. For something like tenant based affinity that information may already be part of the route definition.
  • Alternatively we could put it into the route config (ProxyRoute), allowing it to specify multiple backends and an assignment strategy.
  • This should be compatible with retry mechanisms to allow failing over to the other test group.
@Tratcher Tratcher added the Type: Idea This issue is a high-level idea for discussion. label Apr 28, 2020
@Tratcher
Copy link
Member Author

@samsp-msft in your surveys have customers explained how they handle this today? And/or how they want to handle it?

@karelz karelz added this to the 1.0.0 milestone May 14, 2020
@karelz
Copy link
Member

karelz commented May 14, 2020

Triage: Design should be done in 1.0 - customers will ask for it.

@Tratcher
Copy link
Member Author

This could also come into play for rolling upgrades: #190 (comment)

  1. Remove some nodes from pool1
  2. Shut down those nodes
  3. Upgrade and restart them
  4. Add those nodes to pool2
  5. Gradually transition a percentage of traffic from pool1 to pool2 and check for errors. (A/B testing)

Repeat until all nodes have been upgraded and moved.

@Tratcher
Copy link
Member Author

Tratcher commented May 26, 2020

Note: Backed.PartitioningOptions seems to be related to this scenario, but it's not actually implemented anywhere. Implement or remove this option section. The same for quota and circuit breaker options.

@Tratcher Tratcher changed the title Design A/B testing strategies Design A/B testing and rolling upgrade strategies May 31, 2020
@Tratcher
Copy link
Member Author

Notes from a partner discussion:

Routing - One Catch-All
 - Partitioning function to select cluster
Clusters, each with one destination
	- Next
	- Current
        - Previous

@Tratcher
Copy link
Member Author

Tratcher commented Sep 8, 2020

@johnazariah to your question from: #405 (comment)

You could still use the headers to decide where requests get assigned, but it would be applied at a stage after routing and would be much more customizable. Affinity is a definite question, the best option is to have a stable selection algorithm like the headers.

@johnazariah
Copy link
Member

Ok I like what I'm hearing... 👍

@samsp-msft samsp-msft added the User Story Used for divisional .NET planning label Oct 21, 2020
@samsp-msft samsp-msft changed the title Design A/B testing and rolling upgrade strategies Proxy can divert routes to support A/B testing, service rollout etc Oct 21, 2020
@Tratcher
Copy link
Member Author

Related: partitioning and shuffle sharding, great talk from AWS: https://www.youtube.com/watch?v=swQbA4zub20

@Tratcher
Copy link
Member Author

E-Core3 is using IHttpProxy directly and does not need anything from us here. There is still general community interest so I'll do some prototyping and see what works.

@karelz karelz modified the milestones: YARP 1.0.0, Backlog Mar 24, 2021
@karelz
Copy link
Member

karelz commented Mar 24, 2021

Triage: Moving to post-1.0 -- we need engaged customers who can validate the design / E2E.

@Tratcher
Copy link
Member Author

Tratcher commented Jan 28, 2022

Here's a partial proposal for 1.1 that should let people build their own A/B features without us having to be too opinionated on their design:

Routes would need a default cluster to get past this check:

var cluster = route.Cluster;
// TODO: Validate on load https://github.com/microsoft/reverse-proxy/issues/797
if (cluster == null)
{
Log.NoClusterFound(_logger, route.Config.RouteId);
context.Response.StatusCode = StatusCodes.Status503ServiceUnavailable;
return Task.CompletedTask;
}

In middleware re-assign the cluster:

var destinationsState = cluster.DestinationsState;
context.Features.Set<IReverseProxyFeature>(new ReverseProxyFeature
{
Route = route,
Cluster = cluster.Model,
AllDestinations = destinationsState.AllDestinations,
AvailableDestinations = destinationsState.AvailableDestinations
});

We'd need a new service to get access to the clusters and routes.

  • Read-only lookup by id: TryGetCluster/Route(id, out var cluster/route)
  • enumeration

@Tratcher
Copy link
Member Author

Tratcher commented Feb 5, 2022

Can everyone here please take a look at #1538. It doesn't fully implement A/B or related systems, but it does provide some of the raw components to unblock people building their own.

@Tratcher Tratcher modified the milestones: YARP 1.1.0, Backlog Feb 11, 2022
@Tratcher Tratcher removed their assignment Feb 11, 2022
@adityamandaleeka adityamandaleeka modified the milestones: Backlog, YARP 2.0.0 May 19, 2022
@adityamandaleeka
Copy link
Member

For 2.0 let's consider how to address the most common subset of use cases with extensibility points.

@samsp-msft samsp-msft moved this to 📋 Backlog in YARP 2.x Jun 9, 2022
@samsp-msft samsp-msft moved this from 📋 Backlog to 🔖 Ready in YARP 2.x Jun 9, 2022
@karelz
Copy link
Member

karelz commented Jun 16, 2022

Triage: Needs Design flash out as next step. Collect requirements, make proposals, discuss them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority:1 Used for divisional .NET planning Type: Idea This issue is a high-level idea for discussion. User Story Used for divisional .NET planning
Projects
Status: Spec wishlist
Development

No branches or pull requests

5 participants