-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replicate webhook pods for resiliency #3391
Conversation
Hi @raballew. Thanks for your PR. I'm waiting for a tektoncd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/kind feature This looks good so far, let's see what the tests think. :) Can you squash your commits so this PR only contains one commit? |
|
@raballew Thank you for your contribution!! Could you include release notes in the PR description? Thank you again! |
At the moment, the webhook is a SPOF in certain scenarios. Under high load or when a node failure occurs the webhook becomes unavailable. Defining a HPA, PDB and affinity rules solves this issue.
@afrittoli CLA signed Thanks for guiding me trough the PR! |
I've not really worked with the HPA before, so I tried this on my kind cluster. After I got the metrics server running with:
The HPA was still not happy, because of missing resource requests in the webhook pod:
After adding resources from in the deployment:
It is now happy:
According to the HPA docs:
So I think that for this to work we need to set resource requests too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. I think we should clarify if we need to add resources.
TBH I don't know what a good value for CPU would be? @imjasonh wdyt?
config/webhook-hpa.yaml
Outdated
@@ -0,0 +1,65 @@ | |||
# Copyright 2019 The Tekton Authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: s/2019/2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I will bump the year.
@afrittoli I also looked through all resource definitions and most of the files were modified through the course of this year and still use 2019 copyright year. If I am not wrong, technically the copyright year should be updated when you made contributions to a file within that year.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm, the practise that I've always followed and that I've seen being used in several open source projects is it to set the year to that of the creation of the file, and do not bother updating the date again.
There are a lot of different opinions about this in the internet. The feeling I get after reading a few opinions is that the year is not so critical in the copyright notice; however company lawyers usually insist on having it.
We could add guidance about this in the community repo @vdemeester @bobcatfish @imjasonh @abayer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general the guidance I've seen is to bump the year if you're touching the file. Bulk commits to update the year everywhere aren't great for history/blame tracking, but keeping it up to date is slightly better than not...
Technically, the copyright year should have a value equal to the year of the last contribution made to a file.
Two options:
I'd probably just borrow from Knative, I don't think either of us do anything very resource intensive in our webhook checks, and if it turns out to be too low we can bump it up in the future. |
ok, (2) sounds reasonable to me, also 100m seems small enough for a validation service.
|
A resource request is required for autoscaler to take any action for a metric.
/retest |
Thanks for the updates! This looks good to me know, the only thing missing is documentation, but I'm ok doing that as a separate PR if you prefer. CC'ing @qu1queee since he's been looking into HA documentation for Tekton. /lgtm |
@afrittoli I will do it in this PR, so everything related to webhooks and HPA is bundled here. Could you point me to the correct file where to add the documentation? |
I think the entry point might be the install and configuration doc. If you decide to go for a dedicated new file, could you also file a PR to the website repo, so that it shows up on tekton.dev too? |
@afrittoli I have added the HA docs to the install page as the amount of documentation needed for this topic is rather small. Once more components such as the controller implement a scaling mechanism as well, this section should be moved to a dedicated |
/retest |
/assign @afrittoli |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the docs on this!
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: afrittoli The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cc @vdemeester |
/lgtm |
Changes
Closes #3386
Submitter Checklist
These are the criteria that every PR should meet, please check them off as you
review them:
See the contribution guide for more details.
Double check this list of stuff that's easy to miss:
cmd
dir, please updatethe release Task to build and release this image.
Reviewer Notes
If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.
Release Notes