Pod proliferation problem using Triggers with many webhooks #125

vtereso · 2019-09-18T19:12:21Z

Expected Behavior

ValidationTasks allow events to be properly validated, which gates EventListener Triggers

Actual Behavior

This works "as intended" assuming low traffic, but with potentially many Triggers on an EventListener, this will start a validate Task for every single Trigger on the the EventListener, which can currently be the source of multiple event sources resulting in a lot of wasted work.

For example, with just two different GitHub repositories (they require separate validation), one may be receiving many events, while the other receives none. Now, whenever an event comes in, the validation Task for this inactive repository will be run.

This could easily result in spikes of thousands of pods being created if someone were to put all their Triggers in a single EventListener. With a forced single source approach, N listener pods are created for N event sources (compared to 1), but this allows for native IP tables to route to the underlying services to these pods (with a single validate) rather than it being resolved internally through many much slower Tasks.

Additional Info

Here are some questions to evaluate under this issue.

The text was updated successfully, but these errors were encountered:

dibbles · 2019-09-19T10:25:28Z

@vtereso I'm not sure I exactly follow what is it you are specifying here - but I think this in relation to conversations we were having yesterday about "pod proliferation"

I think what you are saying is "only start one instance of the validate task/pod" and use an underlying service from that validate pod to report back for each of the triggers? Is that right?

Do you know if the plan is to allow different validate tasks to be specified for different triggers in an event listener?

Do you think the event listener should have validate tasks listed at the top level and each trigger refers to one of those defined tasks?

khrm · 2019-09-19T11:03:39Z

Do you know if the plan is to allow different validate tasks to be specified for different triggers in an event listener?

Do you think the event listener should have validate tasks listed at the top level and each trigger refers to one of those defined tasks?

I think what @vtereso want is just one source for an event listener. So that we have just one 'pod' which communicates with EventListener during validation. Not using tasks or conditionals.

We could use a sidecar for validation?

At the same time, how is a user supposed to use triggers? If 10+ trigger bindings or template is the norm, then that's a big problem. Otherwise, it isn't.

vtereso · 2019-09-19T12:52:19Z

@dibbles Yes, this was in relation to yesterdays call after thinking it through more. Essentially, this change enforces a kind of contract where each event source (e.g. a webhook) must have a different EventListener. The underlying EventListener reconciler currently creates a service and deployment (the listening pod), where this is a unique service/domain that each event would be sent. This way, a singular validation Task is simply authorizing the intended traffic.

apiVersion: tekton.dev/v1alpha1
kind: EventListener
metadata:
  name: listener
spec:
  validation:
    task: validateTaskName
    serviceAccountName: saName
    params:
    - name: paramName
        value: paramValue
  serviceAccountName: tekton-triggers-example-sa
  triggers:
    - binding:
        name: pipeline-binding
      template:
        name: pipeline-template

This approach is also more microservice oriented, where busy repositories/event sources can be scaled independently. This also abides better with the namespaces paradigm (or "projects" within Openshift). Lastly, with one intended event source, race conditions for updating multiple triggers shouldn't be a concern.

Any necessary filtering to be added would still go within the Trigger field

vtereso · 2019-09-19T12:59:35Z

One consideration is that with the case of GitHub, someone could create multiple different webhooks on a repository where we would probably have to take one of the following stances:

Expect one webhook with all the events instead
All the webhooks would need the same secret
Also have one EventListener for each of these distinct webhooks on the repository.

khrm · 2019-09-19T13:07:33Z

We really can't force user to have one webhook or secret.
Third makes sense.

vtereso · 2019-09-19T13:14:35Z

This also enables people to provides their own intermediary adapters/services that can do authentication more efficiently (not spinning up tasks) where the listener could "validate" by a white-list ingress to this adapter/service. This is technically possible currently as well, but there is no** guaranteed enforcement.

ncskier · 2019-09-19T14:59:09Z

It seems to me like the questions at the core of this issue are:

1. Is it "bad" to have one EventListener for each webhook (when I have many webhooks)? Why is it "bad"?

This could create many idle EventListener sink pods
Define what is too "bad" to use

2. Is it "bad" to have many `triggers` in one EventListener? Why is it "bad"?

This could spawn many Validation tasks (and/or Filter tasks when this is implemented)
Define what is too "bad" to use

3. Which approach (1) or (2) should users take when creating many webhooks using Triggers?

Do we need an additional change/solution for a user with many webhooks to be able to use Triggers

I think that we should rephrase this issue to answer these questions above. What do you think @vtereso?

vtereso · 2019-09-19T16:03:19Z

@ncskier Well put. The way I wrote the issue was from the frame of enforcing the first perspective you listed (should this be desired). I'm cool with whoever changing the issue name?

dibyom · 2019-09-19T18:05:42Z

Thanks @ncskier I agree we should tackle 1. and 2. separately.

To me 2. is the more immediate problem. As @vtereso and @khrm suggested, we could change the interface for the validation from a task that is run per event to something more long running.

One idea -- the interface for the validation could be an objectReference to another Addressable. Something like how EventSources in knative/eventing: https://github.com/knative/docs/blob/master/docs/eventing/samples/github-source/github-source.yaml#L17

Another idea could be to have something like an EventSource (or just reuse the Knative EventSources) that process the event i.e, validate it and then send it to the Trigger Event Listener for the actual triggering.

dibyom · 2019-09-24T20:09:59Z

Update: I compiled this into a doc here

Expanding on my comment above after playing around with knative-eventing's Github event source and chatting with @dlorenc....I'll compile these into a proper design proposal but in the meantime what do folks think about the following?

tl;dr I'm suggesting we replace the validation Task with an optional Service that does the same thing in order to not create too many pods.

apiVersion: tekton.dev/v1alpha1
kind: EventListener
metadata:
  name: listener
spec:
  serviceAccountName: tekton-triggers-example-sa
  triggers:
    - name: foo-trig
      binding:
        name: pipeline-binding
      template:
        name: pipeline-template
      interceptor:
        objectRef:
          apiVersion: v1
          kind: Service
          name: gh-validation-service

The syntax is similar to Knative EventSource sinks i.e the objectRef is a reference to a Addressable object.

The concept is similar to AdmissionControllers/AdmissionWebhooks for k8s resources i.e the Event Listener sink will forward incoming events to the interceptor service which can validate the event payload before creating the templated resources. The interceptor should satisfy the following:

Be an Addressable
Take the event HTTP headers and Body as input
Return a response quickly, say within ~100s of ms
Return a 400 if event is not valid
Return 200 with a JSON payload that is used by the EL sink as the event payload
(This supports the use case where the interceptor has to call out to an external system e.g. the Github API to fetch additional information that needs to be part of the Event body)

The basic use cases like Github/Gitlab validation should be part of the Triggers project though users can add in their own for their own deployments easily.

In the future, we should also consider making more first class support for these basic interceptors similar to the built in admission controllers i.e. instead of HTTP calls, these are in-built plugins.

Questions:

Should we have a top level interceptor for the event listener or one per trigger in the listener?
(If the sole reason for us moving the validate task to the top level is to prevent creation of too many pods, then I think we can keep the interceptor at the trigger level since we are not creating pods anymore)
How do we deal with webhook secrets?
With tasks, we were passing the secret reference names as params.

cc @vtereso @khrm @ncskier @iancoffey

khrm · 2019-09-25T12:09:00Z

* Should we have a top level interceptor for the event listener or one per trigger in the listener?
  (If the sole reason for us moving the validate task to the top level is to prevent creation of too many pods, then I think we can keep the interceptor at the trigger level since we are not creating pods anymore)

Yes, I would also like them at the trigger level. My reasoning is because I don't have too many event listeners pod. The same listener can be used for multiple triggers easily.

* How do we deal with webhook secrets?

Pass secret name and key/s Name alongwith Event Body and Headers? It would be responsibility of interceptor to have access to value of secrets from key and name but interceptor would need access to all those secrets.

khrm · 2019-09-25T17:20:14Z

Also, would tasks based approach would face the same problems with conditionals/filtering?
I have no idea whether after the validation step, we would same webhook being used to execute many triggers.

vtereso · 2019-09-26T02:55:18Z

When speaking with @dibyom, I think he made good points regarding this. We wouldn't want to use pods for validation one way or the other because if bogus requests are received, this could easily DOS/overload the cluster. However, for verified payloads, these filter pods should be relatively light weight and probably ok.

vincent-pli · 2019-09-27T01:02:39Z

@dibyom

Another idea could be to have something like an EventSource (or just reuse the Knative EventSources) that process the event i.e, validate it and then send it to the Trigger Event Listener for the actual triggering.

So, is this idea out?

iancoffey · 2019-09-27T13:49:09Z

@vincent-pli fwiw - what you describe is how use this project. I will likely never have an EventListener exposed outside the cluster for security issues - everything gets processed and validated up front (and converted to CloudEvents) and the listener is only used locally. This pattern works really well so far. Hopefully whatever we land on doesn't exclude our ability to make use of this pattern. This "optional" part of what @dibyom mentioned regarding the service above makes me feel like this pattern wont impair users ability to have a system up front doing validation for users instead (or addition to).

dibyom · 2019-09-27T15:28:08Z

Exactly what @iancoffey said! You can always use knative-eventing in front to do all your event processing and then forward those to triggers.

We just did not want that to be the only way to use triggers i.e. we wanted a alternative for users who did not necessarily want to take in all of knative-eventing as a dependency. So, we added this optional task/interceptor bit to do event processing for users who do no want to use knative-eventing!

vincent-pli · 2019-09-29T06:03:13Z

Thanks for clarification, I think that's make sense.

Seems the interceptor not only do the validation, maybe do the mutating or whatever (converted to CloudEvents).
Looks like what eventsource did in knative-eventing, do we have plan to setup some buildin interceptor, for example github interceptor or whatever.

Hum, could we just let user write their eventsource? it's just my thinking 😄

dibyom · 2019-10-16T15:50:09Z

@vincent-pli sorry missed your comment.

Seems the interceptor not only do the validation, maybe do the mutating or whatever (converted to CloudEvents).
yes! similar to event source receive adapters in knative

Looks like what eventsource did in knative-eventing, do we have plan to setup some buildin interceptor, for example github interceptor or whatever.

Yes! we include a default github one though I think

Hum, could we just let user write their eventsource? it's just my thinking smile

Since you can always use a eventsource in front of triggers with the sink set to the event listener. And I think you should be able to use the recieve_adapter part of the event source as an interceptor

Closing since we are no longer using task based validation

dibyom added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/design Categorizes issue or PR as related to design. labels Sep 18, 2019

ncskier changed the title ~~Force single event source for an EventListener~~ Pod proliferation problem of using Triggers with many webhooks Sep 19, 2019

ncskier changed the title ~~Pod proliferation problem of using Triggers with many webhooks~~ Pod proliferation problem using Triggers with many webhooks Sep 19, 2019

dibyom mentioned this issue Oct 1, 2019

Event Interceptors for pre processing Events #140

Merged

6 tasks

dibyom closed this as completed Oct 16, 2019

ncskier mentioned this issue Oct 29, 2019

Add Conditionals to Resource Creation #49

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod proliferation problem using Triggers with many webhooks #125

Pod proliferation problem using Triggers with many webhooks #125

vtereso commented Sep 18, 2019 •

edited by ncskier

Loading

dibbles commented Sep 19, 2019

khrm commented Sep 19, 2019 •

edited

Loading

vtereso commented Sep 19, 2019 •

edited

Loading

vtereso commented Sep 19, 2019 •

edited

Loading

khrm commented Sep 19, 2019

vtereso commented Sep 19, 2019 •

edited

Loading

ncskier commented Sep 19, 2019

vtereso commented Sep 19, 2019 •

edited

Loading

dibyom commented Sep 19, 2019

dibyom commented Sep 24, 2019 •

edited

Loading

khrm commented Sep 25, 2019

khrm commented Sep 25, 2019

vtereso commented Sep 26, 2019

vincent-pli commented Sep 27, 2019

iancoffey commented Sep 27, 2019

dibyom commented Sep 27, 2019

vincent-pli commented Sep 29, 2019

dibyom commented Oct 16, 2019 •

edited

Loading

Pod proliferation problem using Triggers with many webhooks #125

Pod proliferation problem using Triggers with many webhooks #125

Comments

vtereso commented Sep 18, 2019 • edited by ncskier Loading

Expected Behavior

Actual Behavior

Additional Info

dibbles commented Sep 19, 2019

khrm commented Sep 19, 2019 • edited Loading

vtereso commented Sep 19, 2019 • edited Loading

vtereso commented Sep 19, 2019 • edited Loading

khrm commented Sep 19, 2019

vtereso commented Sep 19, 2019 • edited Loading

ncskier commented Sep 19, 2019

1. Is it "bad" to have one EventListener for each webhook (when I have many webhooks)? Why is it "bad"?

2. Is it "bad" to have many triggers in one EventListener? Why is it "bad"?

3. Which approach (1) or (2) should users take when creating many webhooks using Triggers?

vtereso commented Sep 19, 2019 • edited Loading

dibyom commented Sep 19, 2019

dibyom commented Sep 24, 2019 • edited Loading

Questions:

khrm commented Sep 25, 2019

khrm commented Sep 25, 2019

vtereso commented Sep 26, 2019

vincent-pli commented Sep 27, 2019

iancoffey commented Sep 27, 2019

dibyom commented Sep 27, 2019

vincent-pli commented Sep 29, 2019

dibyom commented Oct 16, 2019 • edited Loading

vtereso commented Sep 18, 2019 •

edited by ncskier

Loading

khrm commented Sep 19, 2019 •

edited

Loading

vtereso commented Sep 19, 2019 •

edited

Loading

vtereso commented Sep 19, 2019 •

edited

Loading

vtereso commented Sep 19, 2019 •

edited

Loading

2. Is it "bad" to have many `triggers` in one EventListener? Why is it "bad"?

vtereso commented Sep 19, 2019 •

edited

Loading

dibyom commented Sep 24, 2019 •

edited

Loading

dibyom commented Oct 16, 2019 •

edited

Loading