-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make generated pods only request the maximum necessary resources #723
Make generated pods only request the maximum necessary resources #723
Conversation
ignorePrivateResourceFields = cmpopts.IgnoreUnexported(resource.Quantity{}) | ||
nopContainer = corev1.Container{ | ||
resourceQuantityCmp = cmp.Comparer(func(x, y resource.Quantity) bool { | ||
return x.Cmp(y) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this comparator in 3 places. It would be nice to have a default comparator with some helpful default settings for use in tests, but I didn't see any kind of util package or anything for that kind of thing. Is there a place for something like that?
} | ||
} | ||
|
||
func Memory(val string) ResourceListOp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively we could just have func Resource(name corev1.ResourceName, val string)
so we don't need a separate method for every function, but that would be a little more verbose in the tests.
/test pull-tekton-pipeline-integration-tests |
I guess the tests are failing for the reason Nader described here? /test pull-tekton-pipeline-integration-tests |
/test pull-tekton-pipeline-integration-tests |
e9649ca
to
be66897
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Desgin wise I'm torn between:
- the current implement, that let the
Max
request of each type on it's original container (step) - an implementation where we would get the
Max
for each and apply to the first container (step)
The later makes it easier to know where to look for those resource requests.
@dwnusbaum @bobcatfish wdyt ? 🚴♂️
@vdemeester No preference from me, it would be relatively easy to switch to applying the max requests to the first container. It might be a bit cleaner to do things that way rather than needing to track indices. |
@vdemeester Actually I think moving the max requests to the first container is problematic because of resource limits. Take the following example:
The current approach turns this into:
But the other approach would cause the first step to request resources beyond its limit:
We could adjust the limit in these cases, but I'd prefer to avoid modifying limits if we don't have to. |
Ah ! Make sense 😅 Let's go for the initial take then 👼 |
db2eab3
to
d9c4eb1
Compare
@@ -129,6 +129,12 @@ or container images that you define: | |||
the configuration file. | |||
- Each container image runs until completion or until the first failure is | |||
detected. | |||
- The CPU, memory, and ephemeral storage resource requests will be set to zero |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any thoughts on a better place to put this documentation or a better way to phrase it would be welcome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/hold
Waiting for @abayer and/or @bobcatfish review 👼 🙏
/lgtm |
/hold cancel ...forgot the syntax. =) |
Looks like the unit tests are failing after other PRs were merged (maybe #748?), probably a logical conflict somewhere with the changes in this PR. I'll rebase and fix the tests. |
arf sorry @dwnusbaum 😓 🙇♂️ |
@vdemeester No problem, probably trivial to fix 😄 |
Before this change, if CPU, memory, or ephemeral storage resource requests were set in a Task's steps (which are Containers), the generated Pod would require the sum of all of the steps' requests to be scheduled on a Node. However, because Tekton overwrites Container entrypoints in Tasks to make the Containers logically execute one at a time, we want to make Pods generated by the TaskRun only request the maximum resources that will be necessary for any single Container rather than the sum of all resource requests. To make this happen, when generating a Pod for a Task, we find the Step with largest resource requests among all Steps, and set the resource requests for all other steps to 0 for the respective resource. If no Step has an explicit resource request, all requests are set to 0. If we unset resource requests instead of setting them to 0 explicitly, then the limits would be used for the requests, which would defeat the purpose of unsetting the requested values (and could end up making the Pod request more memory than it did in the first place). Fixes tektoncd#598
d9c4eb1
to
a8c6493
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dwnusbaum, vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful commit message!!! a8c6493 ❤️ 😻 ❤️
I did not add any e2e tests
Totally reasonable imo! We often add too many end to end tests tbh
// one at a time, so we want pods to only request the maximum resources needed | ||
// at any single point in time. If no contianer has an explicit resource | ||
// request, all requests are set to 0. | ||
func zeroNonMaxResourceRequests(container *corev1.Container, containerIndex int, maxIndicesByResource map[corev1.ResourceName]int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these functions are excellent! nice focused interface and clear short functions. My only too-late-to-the-party request would be to have unit tests covering these functions directly as well but im fully expecting to be ignored since im so late haha :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I'm going to file an issue for a followup I thought of the other day, and if I'm in this area again I'll add some unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome, that sounds great @dwnusbaum ❤️ !!
Changes
Before this change, if CPU, memory, or ephemeral storage resource requests were set in a Task's steps (which are Containers), the generated Pod would require the sum of all of the steps' requests to be scheduled on a Node. However, because Tekton overwrites Container entrypoints in Tasks to make the Containers logically execute one at a time, we want to make Pods generated by the TaskRun only request the maximum resources that will be necessary for any single Container rather than the sum of all resource requests.
To make this happen, when generating a Pod for a Task, we find the Step with largest resource requests among all Steps, and set the resource requests for all other steps to 0 for the respective resource. If no Step has an explicit resource request, all requests are set to 0. If we unset resource requests instead of setting them to 0 explicitly, then the limits would be used for the requests, which would defeat the purpose of unsetting the requested values (and could end up making the Pod request more memory than it did in the first place).
I did not add any e2e tests, but would be happy to do so if desired by reviewers. I think the tests would need to look up
Nodes
in the cluster to find the maximum allowed resources and then create some tasks dynamically that would have been unschedulable before this change but work after the change.CC @bbrowning, @abayer
Fixes #598
Submitter Checklist
These are the criteria that every PR should meet, please check them off as you
review them:
See the contribution guide
for more details.
Release Notes