Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod proliferation problem using Triggers with many webhooks #125

Closed
vtereso opened this issue Sep 18, 2019 · 18 comments
Closed

Pod proliferation problem using Triggers with many webhooks #125

vtereso opened this issue Sep 18, 2019 · 18 comments
Labels
kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/design Categorizes issue or PR as related to design.

Comments

@vtereso
Copy link

vtereso commented Sep 18, 2019

Expected Behavior

ValidationTasks allow events to be properly validated, which gates EventListener Triggers

Actual Behavior

This works "as intended" assuming low traffic, but with potentially many Triggers on an EventListener, this will start a validate Task for every single Trigger on the the EventListener, which can currently be the source of multiple event sources resulting in a lot of wasted work.

For example, with just two different GitHub repositories (they require separate validation), one may be receiving many events, while the other receives none. Now, whenever an event comes in, the validation Task for this inactive repository will be run.

This could easily result in spikes of thousands of pods being created if someone were to put all their Triggers in a single EventListener. With a forced single source approach, N listener pods are created for N event sources (compared to 1), but this allows for native IP tables to route to the underlying services to these pods (with a single validate) rather than it being resolved internally through many much slower Tasks.

Additional Info

Here are some questions to evaluate under this issue.

@dibyom dibyom added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/design Categorizes issue or PR as related to design. labels Sep 18, 2019
@dibbles
Copy link
Member

dibbles commented Sep 19, 2019

@vtereso I'm not sure I exactly follow what is it you are specifying here - but I think this in relation to conversations we were having yesterday about "pod proliferation"

I think what you are saying is "only start one instance of the validate task/pod" and use an underlying service from that validate pod to report back for each of the triggers? Is that right?

Do you know if the plan is to allow different validate tasks to be specified for different triggers in an event listener?

Do you think the event listener should have validate tasks listed at the top level and each trigger refers to one of those defined tasks?

@khrm
Copy link
Contributor

khrm commented Sep 19, 2019

Do you know if the plan is to allow different validate tasks to be specified for different triggers in an event listener?

Do you think the event listener should have validate tasks listed at the top level and each trigger refers to one of those defined tasks?

I think what @vtereso want is just one source for an event listener. So that we have just one 'pod' which communicates with EventListener during validation. Not using tasks or conditionals.

We could use a sidecar for validation?

At the same time, how is a user supposed to use triggers? If 10+ trigger bindings or template is the norm, then that's a big problem. Otherwise, it isn't.

@vtereso
Copy link
Author

vtereso commented Sep 19, 2019

@dibbles Yes, this was in relation to yesterdays call after thinking it through more. Essentially, this change enforces a kind of contract where each event source (e.g. a webhook) must have a different EventListener. The underlying EventListener reconciler currently creates a service and deployment (the listening pod), where this is a unique service/domain that each event would be sent. This way, a singular validation Task is simply authorizing the intended traffic.

apiVersion: tekton.dev/v1alpha1
kind: EventListener
metadata:
  name: listener
spec:
  validation:
    task: validateTaskName
    serviceAccountName: saName
    params:
    - name: paramName
        value: paramValue
  serviceAccountName: tekton-triggers-example-sa
  triggers:
    - binding:
        name: pipeline-binding
      template:
        name: pipeline-template

This approach is also more microservice oriented, where busy repositories/event sources can be scaled independently. This also abides better with the namespaces paradigm (or "projects" within Openshift). Lastly, with one intended event source, race conditions for updating multiple triggers shouldn't be a concern.

Any necessary filtering to be added would still go within the Trigger field

@vtereso
Copy link
Author

vtereso commented Sep 19, 2019

One consideration is that with the case of GitHub, someone could create multiple different webhooks on a repository where we would probably have to take one of the following stances:

  • Expect one webhook with all the events instead
  • All the webhooks would need the same secret
  • Also have one EventListener for each of these distinct webhooks on the repository.

@khrm
Copy link
Contributor

khrm commented Sep 19, 2019

We really can't force user to have one webhook or secret.
Third makes sense.

@vtereso
Copy link
Author

vtereso commented Sep 19, 2019

This also enables people to provides their own intermediary adapters/services that can do authentication more efficiently (not spinning up tasks) where the listener could "validate" by a white-list ingress to this adapter/service. This is technically possible currently as well, but there is no** guaranteed enforcement.

@ncskier
Copy link
Member

ncskier commented Sep 19, 2019

It seems to me like the questions at the core of this issue are:

1. Is it "bad" to have one EventListener for each webhook (when I have many webhooks)? Why is it "bad"?

  • This could create many idle EventListener sink pods
  • Define what is too "bad" to use

2. Is it "bad" to have many triggers in one EventListener? Why is it "bad"?

  • This could spawn many Validation tasks (and/or Filter tasks when this is implemented)
  • Define what is too "bad" to use

3. Which approach (1) or (2) should users take when creating many webhooks using Triggers?

  • Do we need an additional change/solution for a user with many webhooks to be able to use Triggers

I think that we should rephrase this issue to answer these questions above. What do you think @vtereso?

@vtereso
Copy link
Author

vtereso commented Sep 19, 2019

@ncskier Well put. The way I wrote the issue was from the frame of enforcing the first perspective you listed (should this be desired). I'm cool with whoever changing the issue name?

@ncskier ncskier changed the title Force single event source for an EventListener Pod proliferation problem of using Triggers with many webhooks Sep 19, 2019
@ncskier ncskier changed the title Pod proliferation problem of using Triggers with many webhooks Pod proliferation problem using Triggers with many webhooks Sep 19, 2019
@dibyom
Copy link
Member

dibyom commented Sep 19, 2019

Thanks @ncskier I agree we should tackle 1. and 2. separately.

To me 2. is the more immediate problem. As @vtereso and @khrm suggested, we could change the interface for the validation from a task that is run per event to something more long running.

One idea -- the interface for the validation could be an objectReference to another Addressable. Something like how EventSources in knative/eventing: https://github.com/knative/docs/blob/master/docs/eventing/samples/github-source/github-source.yaml#L17

Another idea could be to have something like an EventSource (or just reuse the Knative EventSources) that process the event i.e, validate it and then send it to the Trigger Event Listener for the actual triggering.

@dibyom
Copy link
Member

dibyom commented Sep 24, 2019

Update: I compiled this into a doc here

Expanding on my comment above after playing around with knative-eventing's Github event source and chatting with @dlorenc....I'll compile these into a proper design proposal but in the meantime what do folks think about the following?

tl;dr I'm suggesting we replace the validation Task with an optional Service that does the same thing in order to not create too many pods.

apiVersion: tekton.dev/v1alpha1
kind: EventListener
metadata:
  name: listener
spec:
  serviceAccountName: tekton-triggers-example-sa
  triggers:
    - name: foo-trig
      binding:
        name: pipeline-binding
      template:
        name: pipeline-template
      interceptor:
        objectRef:
          apiVersion: v1
          kind: Service
          name: gh-validation-service

The syntax is similar to Knative EventSource sinks i.e the objectRef is a reference to a Addressable object.

The concept is similar to AdmissionControllers/AdmissionWebhooks for k8s resources i.e the Event Listener sink will forward incoming events to the interceptor service which can validate the event payload before creating the templated resources. The interceptor should satisfy the following:

  • Be an Addressable
  • Take the event HTTP headers and Body as input
  • Return a response quickly, say within ~100s of ms
  • Return a 400 if event is not valid
  • Return 200 with a JSON payload that is used by the EL sink as the event payload
    (This supports the use case where the interceptor has to call out to an external system e.g. the Github API to fetch additional information that needs to be part of the Event body)

The basic use cases like Github/Gitlab validation should be part of the Triggers project though users can add in their own for their own deployments easily.

In the future, we should also consider making more first class support for these basic interceptors similar to the built in admission controllers i.e. instead of HTTP calls, these are in-built plugins.

Questions:

  • Should we have a top level interceptor for the event listener or one per trigger in the listener?
    (If the sole reason for us moving the validate task to the top level is to prevent creation of too many pods, then I think we can keep the interceptor at the trigger level since we are not creating pods anymore)

  • How do we deal with webhook secrets?
    With tasks, we were passing the secret reference names as params.

cc @vtereso @khrm @ncskier @iancoffey

@khrm
Copy link
Contributor

khrm commented Sep 25, 2019

* Should we have a top level interceptor for the event listener or one per trigger in the listener?
  (If the sole reason for us moving the validate task to the top level is to prevent creation of too many pods, then I think we can keep the interceptor at the trigger level since we are not creating pods anymore)

Yes, I would also like them at the trigger level. My reasoning is because I don't have too many event listeners pod. The same listener can be used for multiple triggers easily.

* How do we deal with webhook secrets?

Pass secret name and key/s Name alongwith Event Body and Headers? It would be responsibility of interceptor to have access to value of secrets from key and name but interceptor would need access to all those secrets.

@khrm
Copy link
Contributor

khrm commented Sep 25, 2019

Also, would tasks based approach would face the same problems with conditionals/filtering?
I have no idea whether after the validation step, we would same webhook being used to execute many triggers.

@vtereso
Copy link
Author

vtereso commented Sep 26, 2019

When speaking with @dibyom, I think he made good points regarding this. We wouldn't want to use pods for validation one way or the other because if bogus requests are received, this could easily DOS/overload the cluster. However, for verified payloads, these filter pods should be relatively light weight and probably ok.

@vincent-pli
Copy link
Member

@dibyom

Another idea could be to have something like an EventSource (or just reuse the Knative EventSources) that process the event i.e, validate it and then send it to the Trigger Event Listener for the actual triggering.

So, is this idea out?

@iancoffey
Copy link
Member

@vincent-pli fwiw - what you describe is how use this project. I will likely never have an EventListener exposed outside the cluster for security issues - everything gets processed and validated up front (and converted to CloudEvents) and the listener is only used locally. This pattern works really well so far. Hopefully whatever we land on doesn't exclude our ability to make use of this pattern. This "optional" part of what @dibyom mentioned regarding the service above makes me feel like this pattern wont impair users ability to have a system up front doing validation for users instead (or addition to).

@dibyom
Copy link
Member

dibyom commented Sep 27, 2019

Exactly what @iancoffey said! You can always use knative-eventing in front to do all your event processing and then forward those to triggers.

We just did not want that to be the only way to use triggers i.e. we wanted a alternative for users who did not necessarily want to take in all of knative-eventing as a dependency. So, we added this optional task/interceptor bit to do event processing for users who do no want to use knative-eventing!

@vincent-pli
Copy link
Member

Thanks for clarification, I think that's make sense.

Seems the interceptor not only do the validation, maybe do the mutating or whatever (converted to CloudEvents).
Looks like what eventsource did in knative-eventing, do we have plan to setup some buildin interceptor, for example github interceptor or whatever.

Hum, could we just let user write their eventsource? it's just my thinking 😄

@dibyom
Copy link
Member

dibyom commented Oct 16, 2019

@vincent-pli sorry missed your comment.

Seems the interceptor not only do the validation, maybe do the mutating or whatever (converted to CloudEvents).
yes! similar to event source receive adapters in knative

Looks like what eventsource did in knative-eventing, do we have plan to setup some buildin interceptor, for example github interceptor or whatever.

Yes! we include a default github one though I think

Hum, could we just let user write their eventsource? it's just my thinking smile

Since you can always use a eventsource in front of triggers with the sink set to the event listener. And I think you should be able to use the recieve_adapter part of the event source as an interceptor

Closing since we are no longer using task based validation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/design Categorizes issue or PR as related to design.
Projects
None yet
Development

No branches or pull requests

7 participants