-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📖 Add designs/multi-cluster.md #2746
base: main
Are you sure you want to change the base?
📖 Add designs/multi-cluster.md #2746
Conversation
Signed-off-by: Dr. Stefan Schimanski <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sttts The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
} | ||
|
||
// pkg/handler | ||
type DeepCopyableEventHandler interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The eventhandlers are stateless, why do we need the deepcopy for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the propotype. I think this is because EventHandler then would store the Cluster (it is using that info to set the ClusterName field in the request)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is gone now in #2726.
Will update the design here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the BYO request/eventhandler changes in #3019, I brought this back after remembering it was mentioned in the proposal. The previous version of my prototype had a weird second layer of event handler that was wrapping the actual event wrapper and was using the event object to communicate the cluster name in. That felt all kinds of weird.
Because we now have BYO EventHandlers, it's possible that they are not entirely stateless (as @sbueringer pointed out, some event handlers might have to store the cluster name in absence of any information on the event object itself). So I think this approach is the most clean, to be honest. It's entirely optional in #3019, existing EventHandlers don't need to be changed.
Disengage(context.Context, Cluster) error | ||
} | ||
``` | ||
In particular, controllers implement the `AwareRunnable` interface. They react |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than changing the controller type directly and requiring all its dependencies to known how to deepcopy themselves, how about having something like a controllerconstructor
(name tbd) in between that is filled with a []watchConstructor{source func(Cluster) source.Source, handler func(Cluster) handler.Handler, predicate func(cluster) []predicate.Predicate}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would require more invasive changes to our public API (the Controller interface)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you can call Watch
on an existing controller. The idea is to not let the Controller
or its dependencies have any knowledge about this but instead have a thing on top of the Controller
that is configured with constructors that take a cluster.Cluster
and return a source/predicate/handler and then uses those to call Watch
when a new cluster appears.
When one disappears, it would cancel the context
on the Source.
The idea really is the opposite, I do not want the Controller
to know how to extend itself like this, IMHO this is a higher-level abstraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compare #2726 after latest push. I have implemented @alvaroaleman's idea via a MultiClusterController
wrapper implementing cluster.AwareRunnable
and just calling Watch
on the actual controller. All the deepcopy'ing is gone 🎉 Much nicer IMO. @alvaroaleman great intuition!
// pkg/cluster | ||
type Provider interface { | ||
Get(ctx context.Context, clusterName string, opts ...Option) (Cluster, error) | ||
List(ctx context.Context) ([]string, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why return []string
here rather than []Cluster
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Would be good for consistency with the Get func
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a misunderstanding of the interface. The getter is actually the constructor. The life-cycle of the returned clusters is owned by the manager (they are added as runnables). Hence, the List
returns names, not clusters. We should rather rename Get
to Create
or Connect
.
} | ||
``` | ||
|
||
The `ctrl.Manager` will use the provider to watch clusters coming and going, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to think about if and how this is doable, but ideally the "thing that comes and goes" wouldn't be typed to cluster.Cluster
but can be anything, so this mechanism can also be used if folks have sources that are not kube watches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be mostly about a more generic name? (can't think of much that would work, maybe something like scope)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this is an interesting idea and i could see using it, i just have a question about some of the mechanics.
for context, i am investigating a cluster-api provider for karpenter and it would be nice to have the controllers discriminate between objects in the management cluster and objects in the workload clusters.
### Examples | ||
|
||
- Run a controller-runtime controller against a kubeconfig with arbitrary many contexts, all being reconciled. | ||
- Run a controller-runtime controller against cluster-managers like kind, Cluster-API, Open-Cluster-Manager or Hypershift. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the cluster-api example here, is the intention that controllers will be able to reconcile CRDs in clusters that they know about that may only exist in a subset of clusters (e.g. Machine objects in the management cluster but not in the workload cluster) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think that has to be possible. Otherwise we need all resources that we watch in all clusters
(especially good point because today a controller crashes if a resource doesn't exist)
EDIT: Further down:
For example, it can well be that every cluster has different REST mapping because installed CRDs are different. Without a context, we cannot return the right REST mapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Question is whether one would rather group them in managers such that every manager has a uniform set of clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my updated PR #2726. You can now opt into provider and/or the default cluster per controller via options:
// EngageWithDefaultCluster indicates whether the controller should engage
// with the default cluster of a manager. This defaults to false through the
// global controller options of the manager if a cluster provider is set,
// and to true otherwise. Here it can be overridden.
EngageWithDefaultCluster *bool
// EngageWithProvidedClusters indicates whether the controller should engage
// with the provided clusters of a manager. This defaults to true through the
// global controller options of the manager if a cluster provider is set,
// and to false otherwise. Here it can be overridden.
EngageWithProviderClusters *bool
There is no logic yet for a controller to decide whether to engage with a provider cluster or not. Now it's with all of them. If the setup is more diverse, we might want such a functionality, e.g. some kind of pre-check: ctrl.WantsToEngage(ctx, cluster) bool`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm still understanding the changes in #2726, but i think what you are saying here makes sense to me and would solve the issue.
some kind of pre-check: ctrl.WantsToEngage(ctx, cluster) bool`.
+1, i think we definitely need some way for the client user to specify when it should check a specific cluster for a resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I somehow think it should be the author's and managers responsibility (for now) to group them into groups which are working with the pattern. At this point, we don't know what we don't know. Once this is released, we can gather some feedback on edge cases and take it from there. I suspect the majority of use cases will be still single cluster reconcile loops.
Maybe document this edge case and mark this feature overall as experimental? This way we not committing to full production level stability, and allow to gather more feedback?
For those reading, this is currently a little outdated. #2726 has a changed design proposed by @alvaroaleman. Will come back soon to both PRs. |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/reopen We'd like to continue working on this, time is simply a bit scarce at the moment. |
@embik: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Hi, could you please share the future plans for this feature? Thank you! |
/remove-lifecycle rotten |
Controller-runtime today allows to write controllers against one cluster only.
Multi-cluster use-cases require the creation of multiple managers and/or cluster
objects. This proposal is about adding native support for multi-cluster use-cases
to controller-runtime.
The proposed changes are prototyped in #2726.