Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validating Admission Webhook #13503

Open
agilgur5 opened this issue Aug 26, 2024 · 3 comments · May be fixed by #13879
Open

Validating Admission Webhook #13503

agilgur5 opened this issue Aug 26, 2024 · 3 comments · May be fixed by #13879
Labels
type/feature Feature request

Comments

@agilgur5
Copy link

agilgur5 commented Aug 26, 2024

Summary

We currently have various scenarios where an invalid Workflow will be allowed in the cluster, causing bugs to be found much later instead of upfront. The CLI and API handle (some) validation, but direct to k8s will miss some of this.

Use Cases

The lack of any validation when submitting directly to k8s can cause some unexpected scenarios like #6781 , #12693, #10630. In all those issues and some others, I've mentioned the concept of a Validating Admission Webhook to prevent these scenarios and give clearer error messages: #12693 (comment), #10630 (comment), #6781 (comment)

Note that even if we do have a validating admission webhook, it may be optional, it may not handle resources that were created before it existed, and it may not be able to validate all scenarios, so the Controller and Informers would still have to be "tolerant" (the word the codebase uses) of invalid resources. This is primarily a feature to improve UX

Implementation Details

There are 2 primary ways of doing Validating Admission in k8s these days:

  1. Admission Webhooks

    • This is simple in terms of logic, in that we'd just re-use the existing validation logic, either with the Controller or the Server
      • Typically admission is a Controller responsibility, but hosting this behavior in the Controller would require that it respond to API requests. I have seen other tools host as a separate component entirely as well, e.g. in a Helm Chart as argo-workflows.validatingAdmissionWebhook.enabled: true or similar.
    • Problematically, k8s webhooks require TLS certs that the k8s control plane trusts, which complicates Deployment quite a bit as it requires cert-manager etc.
    • There is also a question of failing open or failing closed. I think we can fail closed, and latency sensitive use-cases could just not deploy / not enable the webhook entirely.
  2. Admission Policies (stable as of k8s 1.30)

    • This is basically a policy engine built-in to k8s (instead of external like Kyverno, OPA Gatekeeper, etc) that supports CEL
    • This would require rewriting some validation in CEL, but is otherwise quite straightforward to deploy:
      1. Can be used directly in-line with CRDs
      2. Or can be separate resources. Given the problems with CRD size, this might be the best possible solution
    • There are some more dynamic validation scenarios that would be hard to cover via CEL, but it could probably still cover a decent bit

I would probably suggest an optional of version of 1 for consistency, and then 2.b. as something deployed with all manifests as a baseline. We could even start making simple policies already and add them to the manifests and then gradually build them up -- #6781 is a very simple length check, for instance


Message from the maintainers:

Love this feature request? Give it a 👍. We prioritise the proposals with the most 👍.

@agilgur5 agilgur5 added the type/feature Feature request label Aug 26, 2024
@Joibel
Copy link
Member

Joibel commented Aug 27, 2024

I would like this to happen, especially Admission Webhooks.

There are certain classes of error that Admission Policies couldn't be crafted to help with such as invalid fields in yaml dictionaries which are a common class of error - either just misspellings or using a field which isn't valid in this location.

@agilgur5
Copy link
Author

agilgur5 commented Aug 27, 2024

invalid fields in yaml dictionaries

That one could be detected actually as CEL Macros include has() and exists(). But detecting all invalid schemas is likely non-trivial

More dynamic things though, like detecting certain kinds of invalid expr expression, are not possible.
Technically even some simple invalid expressions (like #12899) could be detected with plain string parsing, but it would make for some convoluted/hacky code.

@williamburgson williamburgson linked a pull request Nov 8, 2024 that will close this issue
MasonM added a commit to MasonM/argo-workflows that referenced this issue Jan 2, 2025
The full CRDs at `manifests/base/crds/full` are only intended for usage
in editors
(argoproj#11266 (comment))
and are unusable otherwise. Trying to install them will cause spurious
validation errors with nearly every workflow.

Having the full CRDs would enable many useful features, such as `kubectl
explain` output (argoproj#8190),
[validating admission
policies](https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/)
(argoproj#13503), and
integration with tools like cdk8s
(argoproj#8532).

As explained at
argoproj#13754 (comment),
there's two issues that prevent us from using the full CRDs everywhere:
1. [Kubernetes size
   limits](kubernetes/kubernetes#82292) when
   using client-side apply. This is not a problem when using
   .
2. Wrong validation information due to kubebuilder limitations.

For the first issue, I verified that this is no longer an issue when
using [server-side
apply](https://kubernetes.io/docs/reference/using-api/server-side-apply/),
so I updated the `make install` target to use it. Since `kubectl apply --server-side`
isn't compatible with `kubectl apply --prune`, I had to switch to use
[apply
sets](https://kubernetes.io/docs/tasks/manage-kubernetes-objects/declarative-config/#alternative-kubectl-apply-f-directory-prune),
which is intended to replace `--prune`, and seems to work well.

For the second issue, I went through and added workarounds to
`hack/manifests/crds.go` for all the kubebuilder issues I could find.
Since the E2E test suite now uses the full CRDs, it should tell us if
there's anything I missed.

I didn't update the release manifests to use the full CRDs, since that's
a big change and we probably should wait awhile to make sure there
aren't any unexpected issues. Users can easily opt into the full
CRDs if they want.

Signed-off-by: Mason Malone <[email protected]>
MasonM added a commit to MasonM/argo-workflows that referenced this issue Jan 2, 2025
The full CRDs at `manifests/base/crds/full` are only intended for usage
in editors
(argoproj#11266 (comment))
and are unusable otherwise. Trying to install them will cause spurious
validation errors with nearly every workflow.

Having the full CRDs would enable many useful features, such as `kubectl
explain` output (argoproj#8190),
[validating admission
policies](https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/)
(argoproj#13503), and
integration with tools like cdk8s
(argoproj#8532).

As explained at
argoproj#13754 (comment),
there's two issues that prevent us from using the full CRDs everywhere:
1. [Kubernetes size
   limits](kubernetes/kubernetes#82292) when
   using client-side apply. This is not a problem when using
   .
2. Wrong validation information due to kubebuilder limitations.

For the first issue, I verified that this is no longer an issue when
using [server-side
apply](https://kubernetes.io/docs/reference/using-api/server-side-apply/),
so I updated the `make install` target to use it. Since `kubectl apply --server-side`
isn't compatible with `kubectl apply --prune`, I had to switch to use
[apply
sets](https://kubernetes.io/docs/tasks/manage-kubernetes-objects/declarative-config/#alternative-kubectl-apply-f-directory-prune),
which is intended to replace `--prune`, and seems to work well.

For the second issue, I went through and added workarounds to
`hack/manifests/crds.go` for all the kubebuilder issues I could find.
Since the E2E test suite now uses the full CRDs, it should tell us if
there's anything I missed.

I didn't update the release manifests to use the full CRDs, since that's
a big change and we probably should wait awhile to make sure there
aren't any unexpected issues. Users can easily opt into the full
CRDs if they want.

Signed-off-by: Mason Malone <[email protected]>
@MasonM
Copy link
Contributor

MasonM commented Jan 2, 2025

FYI: I verified validating admission policies will work once we fix the full CRDs. Here's a PR showing an example: #14045

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature Feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants