-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network binding plugin: Support compute container resource overhead #303
base: main
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/uncc @aburdenthehand @jobbler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, overall looks good, see my inline comments.
Regarding the the PR description and second commit message, I think it should also mention that memory overhead is necessary to avoid pod eviction due to passt VMs consume more memory then expected, and improve passt VMs scheduling results; passt VMs wont be scheduled on nodes that doesnt have enough memory.
@@ -1261,6 +1271,10 @@ metadata: | |||
k8s.v1.cni.cncf.io/networks: '[{"name":"netbindingpasst","namespace":"mynamespace","cni-args":{"logicNetworkName":"default"}}]' | |||
spec: | |||
containers: | |||
- name: compute | |||
resources: | |||
requests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think resource limits should be as well, in case the pod satisfy QoS class Guaranteed the plugin overhead will not become the reason for violating it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, there is logic [1] to automatically add memory limits when needed (there is also an equivalent for CPU).
@@ -387,6 +388,13 @@ Alternatives passing the domain attachment to sync VMI: | |||
Both of the options are not perfect, but the `virt-handler.SyncVMI` | |||
has fewer cons. Therefore, it was chosen. | |||
|
|||
#### Additional resource requests for the virt-launcher compute container |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rephrase this section to be vogue about the concrete plugin or dependency inside virt-launcher that requires it, saying in case a plugin requires memory overhead it should be specified in the CR, and refer to passt example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted the wording, please tell me what you think.
design-proposals/network-binding-plugin/network-binding-plugin.md
Outdated
Show resolved
Hide resolved
design-proposals/network-binding-plugin/network-binding-plugin.md
Outdated
Show resolved
Hide resolved
design-proposals/network-binding-plugin/network-binding-plugin.md
Outdated
Show resolved
Hide resolved
|
||
The passt binary is shipped and executed inside the virt-launcher pod. | ||
|
||
### Domain definition & creation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would simplify this section saying the sidecar adds passt interface to the domain, similar to vDPA example, maybe mention it use libvirt user-space networking settings https://libvirt.org/formatdomain.html#userspace-slirp-or-passt-connection.
You can also refer to slirp example section saying passt sidecar works in similar way.
design-proposals/network-binding-plugin/network-binding-plugin.md
Outdated
Show resolved
Hide resolved
ed97ca9
to
8432599
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First commit review.
Although I placed comments inline on that commit, I think it is not a must example addition.
At the design level, we are not really interested in a specific binding, but on the general concept.
The general concept can be applied on other bindings, just to provide an example on how to use it.
design-proposals/network-binding-plugin/network-binding-plugin.md
Outdated
Show resolved
Hide resolved
namespace: default | ||
spec: | ||
config: '{ | ||
"cniVersion": "0.3.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should use 1.0.X in the example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was removed.
Please ack so we can resolve the thread.
metadata: | ||
name: virt-launcher-123 | ||
annotations: | ||
k8s.v1.cni.cncf.io/networks: '[{"name":"netbindingpasst","namespace":"mynamespace","cni-args":{"logicNetworkName":"default"}}]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the logicNetworkName is supposed to be passtnet
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was removed.
Please ack so we can resolve the thread.
design-proposals/network-binding-plugin/network-binding-plugin.md
Outdated
Show resolved
Hide resolved
|
||
### Configure Pod network namespace | ||
|
||
Not required for passt binding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do configure networking for passt
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was removed.
Please ack so we can resolve the thread.
|
||
### Run services in the virt-launcher pod | ||
|
||
The passt binary is shipped and executed inside the virt-launcher pod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should say: Not required for passt binding
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was removed.
Please ack so we can resolve the thread.
<portForward proto='udp'/> | ||
<model type='virtio-non-transitional'/> | ||
<backend type='passt' logFile='/var/run/kubevirt/passt.log'/> | ||
<alias name='ua-default'/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is not in-sync with the network name used in the VM spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was removed.
Please ack so we can resolve the thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second commit:
The commit message hints and explains the passt binding and everything is driven from it. While this is correct in terms of why we do all this, I do not think we should start the story there.
The story should start from the need, mentioning passt as an example.
The fact that there is a binary alongside libvirt is not that important. If it was part of libvirt we would have needed to take that into account as well.
Also, it would be better to leave the details to the design and in the commit just provide the topic/subject. That way we can easily review it and adjusts it per that review.
@@ -189,6 +189,7 @@ the binding is referenced by name with optional additional arguments. | |||
- domainAttachment (a standard domain definition that exists in | |||
the core . E.g `tap`, `sriov`). | |||
- downwardAPI (E.g `device-info`) | |||
- resourceOverhead (currently only additional memory overhead request for the compute container in the virt-launcher pod is supported) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be specific here and mention the compute-container in the name.
E.g. computeResources
.
This also opens up the different questions and options which should be discussed in a proposal:
- Should we consider only the compute container? What about the sidecar container?
- Ref the compute container, we can use a specific name for the field (e.g.
computeResources
) or we can make it configurable using a flag under a generalresources
. - What is the reason for not just doing a
computeMemoryRequest
needs to convince the reviewers.
Now, we can make it even more general by moving away from the specific network usage of this and look at a general sidecar hook that KubeVirt supports.
E.g.: The Kubvirt CR will have a policy for allocating resources to sidecar containers and possible the compute container. Referencing this policy from the network binding definition or from a sidecar hook. This may be an overkill, but thinking about it and examining the pors/cons of it can be useful.
Some binding plugins may require an additional binary to be shipped inside the virt-launcher compute image. | ||
This binary requires additional resources that have to be explicitly stated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we need to limit it to an additional binary. It could be just more resources from the cgroupv2 of the compute container, consumed by an existing binary (e.g. libvirt).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -1189,6 +1196,8 @@ spec: | |||
passt: | |||
networkAttachmentDefinition: default/netbindingpasst | |||
sidecarImage: quay.io/kubevirt/network-passt-binding | |||
resourceOverhead: | |||
memory: 800Mi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you will have to explain why not:
- Directly
memoryOverhead: 500Mi
(BTW, it should be 500 and not 800). - With explicit
request
and being open for addinglimit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runtimeClasses have a pretty similar concept, maybe worth aligning the API idea
https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/
ie.
overhead:
podFixed:
memory: "120Mi"
cpu: "250m"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runtimeClasses have a pretty similar concept, maybe worth aligning the API idea
Interesting, thanks for sharing this.
I think the API could be in sync with the general resources concept.
This allows to control the resource additions to the container, with the ability to extend it in the future to any type of resource and define both request and limits.
The oddity here, as I see it, is how we express the resources of a different container (i.e. compute) and still not lock the ability to do the same for the sidecar itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EdDev I'm late to review this, but I don't understand why this API needs to add anything to the compute container and not simply advertise the resources needed to run this binding container?!
Same way as it's done for hotplug / other containers?
https://github.com/kubevirt/kubevirt/blob/main/pkg/virt-controller/services/renderresources.go#L611
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runtimeClasses have a pretty similar concept, maybe worth aligning the API idea https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/
ie.
overhead: podFixed: memory: "120Mi" cpu: "250m"
@fabiand I doubt it aligns well. This overhead API was invented for kata and the overhead is being added to any pod created with this runtimeclass, I don't think it aligns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EdDev I'm late to review this, but I don't understand why this API needs to add anything to the compute container and not simply advertise the resources needed to run this binding container?! Same way as it's done for hotplug / other containers? https://github.com/kubevirt/kubevirt/blob/main/pkg/virt-controller/services/renderresources.go#L611
@EdDev do we expect all network binding to affect the compute container or is it just passt?
passt specifically uses a feature in qemu there for the GetMemoryOverhead
functional can identify that passt will be used and add a passt specific overhead - this is instead of a kubevirt CR API.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vladikr It is unknown at this stage how future network binding plugins will affect the compute container's resource consumption.
Network binding plugins have the ability to configure the domain, thus the virt stack might consume additional compute resources which KubeVirt cannot account for.
In the past year, the passt
binding was converted from being a core feature to a plugin, so KubeVirt will not know in advance that is is used.
Also, we should strive for treating all plugins as equal IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vladikr my comment was only about the API design, not about using the POD api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vladikr Hi
I see that according to the network binding plugin design document, one of the integration points that seems legit is the pod definition phase
There were no exceptions regarding resource rendering, and Kubevirt CR was advised as a potential API to extend in order to integrate into that point.
Given that the network binding design was accepted, and also the implementation of this design also got into the codebase, I am not sure how to proceed, I'm in favor of accepting this design.
Can you please advice @vladikr ?
8432599
to
416926c
Compare
Change: Removed the passt example, as it is not essential for to the proposal. Addressed sidecar and compute container resource specification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First commit review.
The sidecar container can have a memory leak and may cause node's destabilization. | ||
|
||
|
||
Alternatives: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep here the decided solution and place the alternatives (with a ref from here) to a dedicated appendix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
- Coarse level of control. | ||
- Only supports CPU and memory requests and limits. | ||
|
||
2. Additional API for sidecar resource configuration: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specific for the binding plugin, not the sidecar. The sentence is not clear about this.
The definition is per the network binding plugin, and applied if the plugin uses a sidecar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
Cons: | ||
- Require an API change. | ||
- The API will probably evolve as additional plugins will be created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this point offline.
It was removed.
Cons: | ||
- Require an API change. | ||
- The API will probably evolve as additional plugins will be created. | ||
- May require cluster admins to adjust plugin resources during the plugin's lifecycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cluster admin is responsible to register the binding plugin in the first place, so I am unclear what this point means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this point offline, improved the wording.
- Only supports CPU and memory requests and limits. | ||
|
||
2. Additional API for sidecar resource configuration: | ||
The network binding plugin API in the KubeVirt CR could receive an additional input field to specify the sidecar resource requirements: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specified resource is define for each instance of usage or per the sidecar, irrelevant of the usage?
E.g. there may be 1 interface using the plugin or there may be 3 interfaces using the plugins in the same VM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed it in the text.
For each network binding plugin used, the VMI controller will add a label on the virt-launcher pod with the following format: | ||
|
||
`kubevirt.io/network-binding-plugin:<plugin-name>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will the admission webhook be able to identify the relevant container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed it in the text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the proposal.
The following general points should be considered and added:
- Effort cost estimation for each option.
For some plugins, such as passt, there is a need to execute an additional binary in the compute container. | ||
Since this binary has its own CPU and memory limits, they should be somehow accounted for. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not limit it to running an additional binary, the addition in memory or other resources may come from different reasons (e.g. libvirt itself requiring more memory due to the expected configuration).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
Alternatives: | ||
1. Manually setting the VM's resources: | ||
The user can override KubeVirt's algorithms and set resource requirements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current logic that adds overhead per internal core logic is colliding with adding manually an overhead? Specifically, if one specifies explicitly resource requests, are these being increased by Kubevirt logic or something else is happening?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way it works is as follows:
A VM could be defined with both guest memory and memory resource specification:
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: vm-cirros
spec:
template:
spec:
domain:
memory:
guest: 128Mi
resources:
requests:
memory: 640Mi # 128Mi for the guest + 512Mi for the network binding plugin
The virt-launcher pod's compute container will have a memory request which contains the sum of:
- Guest VM's memory (128Mi in this example).
- Memory overhead for KubeVirt's components (its size is dependent on the VMI's spec)
- Arbitrary memory overhead (512Mi in this example)
The domain XML will contains the the guest's memory specification (128Mi in this example).
As a side note, it is also possible to specify a memory request with less memory than the guest requires.
|
||
Cons: | ||
- Error prune | ||
- The user does not take into account the overhead considerations KubeVirt takes when templating a virt-launcher pod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is related to my prev question. Can you please confirm this is indeed the case?
I would be surprise if this is how it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not true.
I removed this line.
- The API will probably evolve as additional plugins will be created. | ||
- May require cluster admins to adjust plugin resources during the plugin's lifecycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like with the sidecar previously, these points are not clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this point offline, adjusted the text.
|
||
2. Additional API for compute container resource overhead: | ||
|
||
The network binding plugin API in the KubeVirt CR could receive an additional input field to specify the resource requirements overhead for the compute container: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a need to explicitly specify if this overhead is added per plugin type or per plugin usage count (i.e. per the number of interfaces referencing it from the VM).
It is also important to specify if it is dependent on any other field/setup.
The previous sidecar resource are dependent on the existence of a sidecar, but this one may not have such a dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
416926c
to
e0b43e5
Compare
Change: Moved the API alternative to be the proposed solution. |
35ddd8b
to
1bf6515
Compare
/cc @stu-gott - needs review from someone in sig-compute. Thanks! |
The requirement to maintain certificates for the webhook could be mitigated using tools such as [cert-manager](https://cert-manager.io/). | ||
|
||
# Appendix H: Alternatives to compute container resource overhead specification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a discussion with @enp0s3 , there is another more generic solution alternative:
Add a new resource overhead section under the KV config:
computeOverhead:
selector: ....
resources:
requests:
cpu: 200m
memory: 20Mi
The selector will determine to which pod it will apply to. The binding plugin, will create pods with a label that identifies the pod as using the relevant plugin. The code that adds the overhead, will query it and determine on which pod to add the overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EdDev Hi. I think we would need to open a separate design proposal for the generic solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but we still want to record it here as it is related.
A remark can be added to express that it it will be presented as an independent proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the spesifed resources apply for all containers in the selected pod?
Resources are configured on container types, it seem that it"ll require finer way to set overhead on containers, did you mean selector for containers?
FYI: I suggested something similar in this comment.
Note that the container name should be validated and allow refering to the binding sidecar and compute containers only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the spesifed resources apply for all containers in the selected pod?
Resources are configured on container types, it seem that it"ll require finer way to set overhead on containers, did you mean selector for containers?
I'm sure author of the proposal will handle this detail.
FYI: I suggested something similar in this comment.
Note that the container name should be validated and allow refering to the binding sidecar and compute containers only.
The alternative here is generic, not specific to a binding plugin and requires other steps for it to work.
I think we can get into the details of what it needs to validate or not as part of a different proposal that will focus on this alternative. It should be enough to mention here the general idea, so it will be visible that we discussed it.
computeResourceOverhead: | ||
cpu: 200m | ||
memory: 20Mi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this is missing the requests
(and limits
) level, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-added the requests
field.
e482700
to
848082d
Compare
Change: Updated the compute container's manual resource specification alternative. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes
binding: | ||
mynetbindingplugin: | ||
sidecarImage: quay.io/kubevirt/mynetbindingplugin | ||
sidecarResources: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a note saying this field type should be the native kubernetes resources API, with validation for cpu and memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should be explicit about the supported API.
We should not specify the cpu
but we can add a not that the structure is the native k8s one, allowing extending the support to the other fields (cpu
, custom ones).
The definition of the API should not be limited to an example.
But this is not blocker for me.
Adding the validation section can also be useful. We could specify there that we ignore any data except the memory
at this stage. We should be explicit and say that we do not fail if other data is added. Again, this is not a blocker and I am good with letting this stay gray for now.
- Only supports CPU and memory requests and limits. | ||
|
||
2. Additional API for network binding plugin sidecar resource specification: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the container names should be validated in a way they can refer to the binding sidecar and compute container only.
The requirement to maintain certificates for the webhook could be mitigated using tools such as [cert-manager](https://cert-manager.io/). | ||
|
||
# Appendix H: Alternatives to compute container resource overhead specification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the spesifed resources apply for all containers in the selected pod?
Resources are configured on container types, it seem that it"ll require finer way to set overhead on containers, did you mean selector for containers?
FYI: I suggested something similar in this comment.
Note that the container name should be validated and allow refering to the binding sidecar and compute containers only.
If specified, KubeVirt will add the resource overhead specified in `computeResourceOverhead` to the compute container of the virt-launcher pod. | ||
The specified resource overhead will be added to the compute container per unique usage of the plugin (not per NIC using the binding). | ||
|
||
For example, assuming there is a plugin registered with a 200Mi memory overhead for the `compute` container, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does passt require mempry overhead per interface or based on whether its being executed or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is enough for the interface to be defined for the passt binary to run and consume the memory specified.
It is not clear what you mean by executed
.
sidecarImage: quay.io/kubevirt/mynetbindingplugin | ||
sidecarResources: | ||
requests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a mere yaml example it can be anything 🙂
- Exposes the VM owner to unnecessary implementation details | ||
|
||
2. Mutating webhook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it another webook in addition to one that address the sidecar resources one?
It seem that it lacks description about what it"ll do, pros, cons section and how its going to integrate with kubevirt? ( I recall something about adding some label to launcher pods that use network binding plugins)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added additional details, same as for the sidecar.
binding: | ||
mynetbindingplugin: | ||
sidecarImage: quay.io/kubevirt/mynetbindingplugin | ||
sidecarResources: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should be explicit about the supported API.
We should not specify the cpu
but we can add a not that the structure is the native k8s one, allowing extending the support to the other fields (cpu
, custom ones).
The definition of the API should not be limited to an example.
But this is not blocker for me.
Adding the validation section can also be useful. We could specify there that we ignore any data except the memory
at this stage. We should be explicit and say that we do not fail if other data is added. Again, this is not a blocker and I am good with letting this stay gray for now.
If specified, KubeVirt will add the resource overhead specified in `computeResourceOverhead` to the compute container of the virt-launcher pod. | ||
The specified resource overhead will be added to the compute container per unique usage of the plugin (not per NIC using the binding). | ||
|
||
For example, assuming there is a plugin registered with a 200Mi memory overhead for the `compute` container, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is enough for the interface to be defined for the passt binary to run and consume the memory specified.
It is not clear what you mean by executed
.
|
||
For example, assuming there is a plugin registered with a 200Mi memory overhead for the `compute` container, and | ||
there are two interfaces using it. | ||
`virt-controller` will only add 200Mi of memory to the `compute` container. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this 200Mi
coming from? In the example above you used 20Mi
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a typo.
Fixed it in the example.
The requirement to maintain certificates for the webhook could be mitigated using tools such as [cert-manager](https://cert-manager.io/). | ||
|
||
# Appendix H: Alternatives to compute container resource overhead specification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the spesifed resources apply for all containers in the selected pod?
Resources are configured on container types, it seem that it"ll require finer way to set overhead on containers, did you mean selector for containers?
I'm sure author of the proposal will handle this detail.
FYI: I suggested something similar in this comment.
Note that the container name should be validated and allow refering to the binding sidecar and compute containers only.
The alternative here is generic, not specific to a binding plugin and requires other steps for it to work.
I think we can get into the details of what it needs to validate or not as part of a different proposal that will focus on this alternative. It should be enough to mention here the general idea, so it will be visible that we discussed it.
848082d
to
d4eb851
Compare
d4eb851
to
c33498b
Compare
Signed-off-by: Orel Misan <[email protected]>
c33498b
to
591c3a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
##### Compute Container Resource Overhead | ||
|
||
For some plugins, an additional resource consumption can be expected from the virt-launcher pod compute container. | ||
For example, there could be need to execute an additional binary in the compute container. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain what binary can be executed and how it gets there?
How can an external plugin rely on the fact that a specific binary is part of the virt-launcher image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example was derived from the passt
use case, on which the passt
binary is shipped as part of the virt-launcher compute
image.
Another example could be that the plugin configures the domain in a way that causes the virt-stack to consume additional compute resources that KubeVirt cannot account for.
|
||
For some plugins, an additional resource consumption can be expected from the virt-launcher pod compute container. | ||
For example, there could be need to execute an additional binary in the compute container. | ||
Since this binary has its own CPU and memory limits, they should be somehow accounted for. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, we should speak of a specific functionality and not about arbitrary binaries.
virt-launcher should be well defined.
If we know what functionality we're enabling that requires additional overhead, kubevirt needs to know the resources it requires. Similar to other aspects accounted for in GetMemoryOverhead
.
I would say that if we want an API for this it should be based on a functionality, not a binding.
Each resource consumption of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plugin can perform actions that affect KubeVirt in ways it cannot account for.
Thus the need to externally add resources to the compute container.
Thank you for reviewing this proposal @vladikr, sorry it took some time to respond. |
On behalf of SIG-compute, can one of @jean-edouard @stu-gott @enp0s3 @vladikr please re-review to move this forward. I can merge if I receive an approve or second lgtm from one of you. Thank you. |
Pull requests that are marked with After that period the bot marks them with the label /label needs-approver-review |
/cc |
Deferring the discussion for after the end year holidays |
What this PR does / why we need it:
Some network binding plugins require compute resource overhead, for example the passt plugin requires additional memory overhead in the virt-launcher pod's compute container.
Suggest several alternatives to address this issue.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Checklist
This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.
Release note: