-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix using KNative with ArgoCD (don't set ownerReferences
for webhooks)
#15483
Comments
Hey @thesuperzapper I wanted to confirm something
Shouldn't ArgoCD delete the namespace (since it has no owner reference) and then this should propagate and then the webhooks should eventually be deleted? |
@dprotaso there are a few issues with that:
|
Unsure if pkg/3095 closes this issue - since there's no end-user knob to disable the owner references |
Interesting is this documented anywhere? or is there some guide that recommends this?
So you said before argo wouldn't delete resources with owner references - so does it error out or skip the resource? |
This is a common pattern, especially in app-of-apps deployments. Either way, ArgoCD is the most widest used gitOps tool in the world, so if we want people to use kNative, we need to be flexible with it.
@dprotaso think you might be slightly misunderstanding. What is happening is that ArgoCD will never delete the This means that because the webhooks have a The easiest way to reproduce this is simply create an ArgoCD app for KNative that has the This cascading delete will deadlock 100% of the time in the way described. |
@dprotaso I have made a follow-up PR in It adds an environment variable called Next StepsI think we should also backport it to 1.14 and 1.15 because otherwise its nearly impossible to deploy KNative Serving with ArgoCD. We might also want to add a basic reference to this in the docs (or hope people stumble across this issue). That means we will need to cherry-pick the two PRs into the We need to update the version of
|
I'm good with a toggle in eg. Long term it would be great if we could do the ownership references declaratively kubernetes/kubernetes#102810 I don't think this warrants a cherry pick and cutting new releases. Per our release schedule we'll have a new one out in a week. https://github.com/knative/community/blob/main/mechanics/RELEASE-SCHEDULE.md |
@dprotaso I think that both KNative and ArgoCD are doing something unexpected here. While I agree that ArgoCD should have a workaround for apps that misuse the OwnerReferences, I think that we (KNative) are actually misusing OwnerRefernces by setting the ArgoCD does this because deleting an owned resource is usually dangerous. For example, when using cert-manager, ArgoCD should never delete a secret that is managed by a Certificate resource (which is indicated by the Certificate "owning" the Secret).
I doubt that upstream Kubernetes would accept this because the point of OwnerRefernces is for controllers, not end users.
I really think this is a bug (from the KNative perspective) and since we are still officially supporting 1.14 it warrants a patch release. |
Thinking about this more I wonder if we really just want to set the controller to Does that appease ArgoCD? |
Looks like there's already a PR out for that - argoproj/gitops-engine#503 |
@dprotaso sadly no, but we should still do that change anyway, because it's more semantically correct. At some point ArgoCD might change to ignoring non-controller owners (but there is no consensus in the ArgoCD community yet on if this would introduce it's own problems). So in the meantime, it would be great to get knative/pkg#3103 merged at at least made part of 1.15 so people can deploy kNative safely from ArgoCD. |
@dprotaso there was a slight issue with my first PR, but I have made a quick followup knative/pkg#3107 which definitely works in my testing. That is, when the For example, you can apply the following Kustomize patch to do this: patches:
- patch: |-
apiVersion: apps/v1
kind: Deployment
metadata:
name: REPLACED_DURING_PATCH
spec:
template:
spec:
containers:
- name: webhook
env:
- name: WEBHOOK_DISABLE_NAMESPACE_OWNERSHIP
value: "true"
target:
group: apps
kind: Deployment
name: ".*webhook.*" |
@thesuperzapper should we close this is done? |
@skonto it looks like this made its way into serving 1.16.0 and net-istio 1.16.0, so yes, we can probably close it. |
Background
In 2021 KNative made it so that the
ValidatingWebhookConfiguration
andMutatingWebhookConfiguration
resources were "owned" by the Namespace which KNative is installed into (kanative-serving
by default):This was intended to ensure that users did not leave the webhooks when uninstalling (via deleting the Namespace) and break KNative when they re-installed (because the webhooks are cluster resources, and their backend would not exist and so would fail to validate anything).
Kubernetes has the concept of ownerReferences to indicate the relationships between resources. If an ownerReference sets
blockOwnerDeletion
, kubernetes will clean up these "child" resources before/after the "parent" resources is deleted (before: foreground delete, after: background delete).For example, a Pod owned by a ReplicaSet might have the following ownerReferences:
Whats the problem?
This breaks ArgoCD, a very widely used GitOps system for Kubernetes.
Specifically, ArgoCD will never remove a resource that has
ownerReferences
set, so the issue we were trying to prevent actually happens 100% of the time when deploying KNative with ArgoCD.Here are some related upstream issues:
OwnerReference
not deleted when removed from Helm Chart argoproj/argo-cd#4764controller
flag in owner references argoproj/argo-cd#12210What's the solution?
I propose we make two changes:
KNATIVE_DISABLE_WEBHOOK_OWNER
which can be set on all controller pods.controller=true
:Where is the relevant code?
The code which sets the ownerReferences lives in the
knative-pkg
libraries:webhook/resourcesemantics/validation/reconcile_config.go
webhook/configmaps/configmaps.go
webhook/resourcesemantics/defaulting/defaulting.go
The text was updated successfully, but these errors were encountered: