Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network binding plugin: Support compute container resource overhead #303

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

orelmisan
Copy link
Member

@orelmisan orelmisan commented Jun 16, 2024

What this PR does / why we need it:
Some network binding plugins require compute resource overhead, for example the passt plugin requires additional memory overhead in the virt-launcher pod's compute container.
Suggest several alternatives to address this issue.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note:

NONE

@kubevirt-bot
Copy link

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@kubevirt-bot kubevirt-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Jun 16, 2024
@kubevirt-bot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign alonakaplan for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@orelmisan
Copy link
Member Author

/cc @EdDev @ormergi

@kubevirt-bot kubevirt-bot requested review from EdDev and ormergi June 16, 2024 20:44
@orelmisan
Copy link
Member Author

/uncc @aburdenthehand @jobbler

Copy link
Contributor

@ormergi ormergi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, overall looks good, see my inline comments.

Regarding the the PR description and second commit message, I think it should also mention that memory overhead is necessary to avoid pod eviction due to passt VMs consume more memory then expected, and improve passt VMs scheduling results; passt VMs wont be scheduled on nodes that doesnt have enough memory.

@@ -1261,6 +1271,10 @@ metadata:
k8s.v1.cni.cncf.io/networks: '[{"name":"netbindingpasst","namespace":"mynamespace","cni-args":{"logicNetworkName":"default"}}]'
spec:
containers:
- name: compute
resources:
requests:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think resource limits should be as well, in case the pod satisfy QoS class Guaranteed the plugin overhead will not become the reason for violating it.

Copy link
Member Author

@orelmisan orelmisan Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, there is logic [1] to automatically add memory limits when needed (there is also an equivalent for CPU).

[1] https://github.com/kubevirt/kubevirt/blob/main/pkg/virt-controller/services/renderresources.go#L189

@@ -387,6 +388,13 @@ Alternatives passing the domain attachment to sync VMI:
Both of the options are not perfect, but the `virt-handler.SyncVMI`
has fewer cons. Therefore, it was chosen.

#### Additional resource requests for the virt-launcher compute container
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rephrase this section to be vogue about the concrete plugin or dependency inside virt-launcher that requires it, saying in case a plugin requires memory overhead it should be specified in the CR, and refer to passt example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted the wording, please tell me what you think.


The passt binary is shipped and executed inside the virt-launcher pod.

### Domain definition & creation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simplify this section saying the sidecar adds passt interface to the domain, similar to vDPA example, maybe mention it use libvirt user-space networking settings https://libvirt.org/formatdomain.html#userspace-slirp-or-passt-connection.
You can also refer to slirp example section saying passt sidecar works in similar way.

@orelmisan orelmisan marked this pull request as ready for review June 18, 2024 15:04
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 18, 2024
@orelmisan orelmisan force-pushed the net-binding-plugin-mem-overhead branch 2 times, most recently from ed97ca9 to 8432599 Compare June 18, 2024 17:08
Copy link
Member

@EdDev EdDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First commit review.

Although I placed comments inline on that commit, I think it is not a must example addition.

At the design level, we are not really interested in a specific binding, but on the general concept.
The general concept can be applied on other bindings, just to provide an example on how to use it.

namespace: default
spec:
config: '{
"cniVersion": "0.3.1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should use 1.0.X in the example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was removed.
Please ack so we can resolve the thread.

metadata:
name: virt-launcher-123
annotations:
k8s.v1.cni.cncf.io/networks: '[{"name":"netbindingpasst","namespace":"mynamespace","cni-args":{"logicNetworkName":"default"}}]'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logicNetworkName is supposed to be passtnet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was removed.
Please ack so we can resolve the thread.


### Configure Pod network namespace

Not required for passt binding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do configure networking for passt.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was removed.
Please ack so we can resolve the thread.


### Run services in the virt-launcher pod

The passt binary is shipped and executed inside the virt-launcher pod.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should say: Not required for passt binding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was removed.
Please ack so we can resolve the thread.

<portForward proto='udp'/>
<model type='virtio-non-transitional'/>
<backend type='passt' logFile='/var/run/kubevirt/passt.log'/>
<alias name='ua-default'/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is not in-sync with the network name used in the VM spec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was removed.
Please ack so we can resolve the thread.

Copy link
Member

@EdDev EdDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second commit:

The commit message hints and explains the passt binding and everything is driven from it. While this is correct in terms of why we do all this, I do not think we should start the story there.

The story should start from the need, mentioning passt as an example.
The fact that there is a binary alongside libvirt is not that important. If it was part of libvirt we would have needed to take that into account as well.

Also, it would be better to leave the details to the design and in the commit just provide the topic/subject. That way we can easily review it and adjusts it per that review.

@@ -189,6 +189,7 @@ the binding is referenced by name with optional additional arguments.
- domainAttachment (a standard domain definition that exists in
the core . E.g `tap`, `sriov`).
- downwardAPI (E.g `device-info`)
- resourceOverhead (currently only additional memory overhead request for the compute container in the virt-launcher pod is supported)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be specific here and mention the compute-container in the name.
E.g. computeResources.

This also opens up the different questions and options which should be discussed in a proposal:

  • Should we consider only the compute container? What about the sidecar container?
  • Ref the compute container, we can use a specific name for the field (e.g. computeResources) or we can make it configurable using a flag under a general resources.
  • What is the reason for not just doing a computeMemoryRequest needs to convince the reviewers.

Now, we can make it even more general by moving away from the specific network usage of this and look at a general sidecar hook that KubeVirt supports.
E.g.: The Kubvirt CR will have a policy for allocating resources to sidecar containers and possible the compute container. Referencing this policy from the network binding definition or from a sidecar hook. This may be an overkill, but thinking about it and examining the pors/cons of it can be useful.

Comment on lines 392 to 393
Some binding plugins may require an additional binary to be shipped inside the virt-launcher compute image.
This binary requires additional resources that have to be explicitly stated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we need to limit it to an additional binary. It could be just more resources from the cgroupv2 of the compute container, consumed by an existing binary (e.g. libvirt).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -1189,6 +1196,8 @@ spec:
passt:
networkAttachmentDefinition: default/netbindingpasst
sidecarImage: quay.io/kubevirt/network-passt-binding
resourceOverhead:
memory: 800Mi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you will have to explain why not:

  • Directly memoryOverhead: 500Mi (BTW, it should be 500 and not 800).
  • With explicit request and being open for adding limit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runtimeClasses have a pretty similar concept, maybe worth aligning the API idea
https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/

ie.

overhead:
  podFixed:
    memory: "120Mi"
    cpu: "250m"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runtimeClasses have a pretty similar concept, maybe worth aligning the API idea

Interesting, thanks for sharing this.
I think the API could be in sync with the general resources concept.
This allows to control the resource additions to the container, with the ability to extend it in the future to any type of resource and define both request and limits.

The oddity here, as I see it, is how we express the resources of a different container (i.e. compute) and still not lock the ability to do the same for the sidecar itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EdDev I'm late to review this, but I don't understand why this API needs to add anything to the compute container and not simply advertise the resources needed to run this binding container?!
Same way as it's done for hotplug / other containers?
https://github.com/kubevirt/kubevirt/blob/main/pkg/virt-controller/services/renderresources.go#L611

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runtimeClasses have a pretty similar concept, maybe worth aligning the API idea https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/

ie.

overhead:
  podFixed:
    memory: "120Mi"
    cpu: "250m"

@fabiand I doubt it aligns well. This overhead API was invented for kata and the overhead is being added to any pod created with this runtimeclass, I don't think it aligns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EdDev I'm late to review this, but I don't understand why this API needs to add anything to the compute container and not simply advertise the resources needed to run this binding container?! Same way as it's done for hotplug / other containers? https://github.com/kubevirt/kubevirt/blob/main/pkg/virt-controller/services/renderresources.go#L611

@EdDev do we expect all network binding to affect the compute container or is it just passt?
passt specifically uses a feature in qemu there for the GetMemoryOverhead functional can identify that passt will be used and add a passt specific overhead - this is instead of a kubevirt CR API.
What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vladikr It is unknown at this stage how future network binding plugins will affect the compute container's resource consumption.
Network binding plugins have the ability to configure the domain, thus the virt stack might consume additional compute resources which KubeVirt cannot account for.

In the past year, the passt binding was converted from being a core feature to a plugin, so KubeVirt will not know in advance that is is used.
Also, we should strive for treating all plugins as equal IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vladikr my comment was only about the API design, not about using the POD api.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vladikr Hi
I see that according to the network binding plugin design document, one of the integration points that seems legit is the pod definition phase
There were no exceptions regarding resource rendering, and Kubevirt CR was advised as a potential API to extend in order to integrate into that point.

Given that the network binding design was accepted, and also the implementation of this design also got into the codebase, I am not sure how to proceed, I'm in favor of accepting this design.

Can you please advice @vladikr ?

@orelmisan orelmisan force-pushed the net-binding-plugin-mem-overhead branch from 8432599 to 416926c Compare June 23, 2024 10:50
@orelmisan
Copy link
Member Author

Change: Removed the passt example, as it is not essential for to the proposal.

Addressed sidecar and compute container resource specification.

@orelmisan
Copy link
Member Author

@EdDev @ormergi PTAL.

Copy link
Member

@EdDev EdDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First commit review.

The sidecar container can have a memory leak and may cause node's destabilization.


Alternatives:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep here the decided solution and place the alternatives (with a ref from here) to a dedicated appendix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- Coarse level of control.
- Only supports CPU and memory requests and limits.

2. Additional API for sidecar resource configuration:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specific for the binding plugin, not the sidecar. The sentence is not clear about this.
The definition is per the network binding plugin, and applied if the plugin uses a sidecar.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


Cons:
- Require an API change.
- The API will probably evolve as additional plugins will be created.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this point offline.
It was removed.

Cons:
- Require an API change.
- The API will probably evolve as additional plugins will be created.
- May require cluster admins to adjust plugin resources during the plugin's lifecycle.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cluster admin is responsible to register the binding plugin in the first place, so I am unclear what this point means.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this point offline, improved the wording.

- Only supports CPU and memory requests and limits.

2. Additional API for sidecar resource configuration:
The network binding plugin API in the KubeVirt CR could receive an additional input field to specify the sidecar resource requirements:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specified resource is define for each instance of usage or per the sidecar, irrelevant of the usage?
E.g. there may be 1 interface using the plugin or there may be 3 interfaces using the plugins in the same VM.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed it in the text.

Comment on lines 288 to 290
For each network binding plugin used, the VMI controller will add a label on the virt-launcher pod with the following format:

`kubevirt.io/network-binding-plugin:<plugin-name>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will the admission webhook be able to identify the relevant container?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed it in the text.

@orelmisan orelmisan changed the title Network binding plugin: Support memory overhead Network binding plugin: Support resource overhead Jun 23, 2024
Copy link
Member

@EdDev EdDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the proposal.

The following general points should be considered and added:

  • Effort cost estimation for each option.

Comment on lines 308 to 231
For some plugins, such as passt, there is a need to execute an additional binary in the compute container.
Since this binary has its own CPU and memory limits, they should be somehow accounted for.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not limit it to running an additional binary, the addition in memory or other resources may come from different reasons (e.g. libvirt itself requiring more memory due to the expected configuration).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


Alternatives:
1. Manually setting the VM's resources:
The user can override KubeVirt's algorithms and set resource requirements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current logic that adds overhead per internal core logic is colliding with adding manually an overhead? Specifically, if one specifies explicitly resource requests, are these being increased by Kubevirt logic or something else is happening?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way it works is as follows:
A VM could be defined with both guest memory and memory resource specification:

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: vm-cirros
spec:
  template:
    spec:
      domain:
        memory:
          guest: 128Mi
        resources:
          requests:
            memory: 640Mi # 128Mi for the guest + 512Mi for the network binding plugin

The virt-launcher pod's compute container will have a memory request which contains the sum of:

  • Guest VM's memory (128Mi in this example).
  • Memory overhead for KubeVirt's components (its size is dependent on the VMI's spec)
  • Arbitrary memory overhead (512Mi in this example)

The domain XML will contains the the guest's memory specification (128Mi in this example).

As a side note, it is also possible to specify a memory request with less memory than the guest requires.


Cons:
- Error prune
- The user does not take into account the overhead considerations KubeVirt takes when templating a virt-launcher pod.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is related to my prev question. Can you please confirm this is indeed the case?
I would be surprise if this is how it works.

Copy link
Member Author

@orelmisan orelmisan Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true.
I removed this line.

Comment on lines 350 to 351
- The API will probably evolve as additional plugins will be created.
- May require cluster admins to adjust plugin resources during the plugin's lifecycle.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like with the sidecar previously, these points are not clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this point offline, adjusted the text.


2. Additional API for compute container resource overhead:

The network binding plugin API in the KubeVirt CR could receive an additional input field to specify the resource requirements overhead for the compute container:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a need to explicitly specify if this overhead is added per plugin type or per plugin usage count (i.e. per the number of interfaces referencing it from the VM).

It is also important to specify if it is dependent on any other field/setup.
The previous sidecar resource are dependent on the existence of a sidecar, but this one may not have such a dependency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@orelmisan orelmisan force-pushed the net-binding-plugin-mem-overhead branch from 416926c to e0b43e5 Compare June 24, 2024 15:11
@orelmisan
Copy link
Member Author

Change: Moved the API alternative to be the proposed solution.

@orelmisan orelmisan force-pushed the net-binding-plugin-mem-overhead branch from 35ddd8b to 1bf6515 Compare July 3, 2024 10:59
@aburdenthehand
Copy link
Member

aburdenthehand commented Jul 3, 2024

/cc @stu-gott - needs review from someone in sig-compute. Thanks!

@kubevirt-bot kubevirt-bot requested a review from stu-gott July 3, 2024 14:07
The requirement to maintain certificates for the webhook could be mitigated using tools such as [cert-manager](https://cert-manager.io/).

# Appendix H: Alternatives to compute container resource overhead specification
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a discussion with @enp0s3 , there is another more generic solution alternative:

Add a new resource overhead section under the KV config:

          computeOverhead:
              selector: ....
              resources:
                  requests:
                      cpu: 200m
                      memory: 20Mi

The selector will determine to which pod it will apply to. The binding plugin, will create pods with a label that identifies the pod as using the relevant plugin. The code that adds the overhead, will query it and determine on which pod to add the overhead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EdDev Hi. I think we would need to open a separate design proposal for the generic solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but we still want to record it here as it is related.
A remark can be added to express that it it will be presented as an independent proposal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the spesifed resources apply for all containers in the selected pod?
Resources are configured on container types, it seem that it"ll require finer way to set overhead on containers, did you mean selector for containers?

FYI: I suggested something similar in this comment.
Note that the container name should be validated and allow refering to the binding sidecar and compute containers only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the spesifed resources apply for all containers in the selected pod?
Resources are configured on container types, it seem that it"ll require finer way to set overhead on containers, did you mean selector for containers?

I'm sure author of the proposal will handle this detail.

FYI: I suggested something similar in this comment.
Note that the container name should be validated and allow refering to the binding sidecar and compute containers only.

The alternative here is generic, not specific to a binding plugin and requires other steps for it to work.
I think we can get into the details of what it needs to validate or not as part of a different proposal that will focus on this alternative. It should be enough to mention here the general idea, so it will be visible that we discussed it.

Comment on lines 309 to 312
computeResourceOverhead:
cpu: 200m
memory: 20Mi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this is missing the requests (and limits) level, why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-added the requests field.

@orelmisan orelmisan force-pushed the net-binding-plugin-mem-overhead branch 2 times, most recently from e482700 to 848082d Compare July 5, 2024 20:00
@orelmisan
Copy link
Member Author

Change: Updated the compute container's manual resource specification alternative.

Copy link
Contributor

@ormergi ormergi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes

binding:
mynetbindingplugin:
sidecarImage: quay.io/kubevirt/mynetbindingplugin
sidecarResources:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a note saying this field type should be the native kubernetes resources API, with validation for cpu and memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should be explicit about the supported API.
We should not specify the cpu but we can add a not that the structure is the native k8s one, allowing extending the support to the other fields (cpu, custom ones).

The definition of the API should not be limited to an example.
But this is not blocker for me.

Adding the validation section can also be useful. We could specify there that we ignore any data except the memory at this stage. We should be explicit and say that we do not fail if other data is added. Again, this is not a blocker and I am good with letting this stay gray for now.

- Only supports CPU and memory requests and limits.

2. Additional API for network binding plugin sidecar resource specification:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the container names should be validated in a way they can refer to the binding sidecar and compute container only.

The requirement to maintain certificates for the webhook could be mitigated using tools such as [cert-manager](https://cert-manager.io/).

# Appendix H: Alternatives to compute container resource overhead specification
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the spesifed resources apply for all containers in the selected pod?
Resources are configured on container types, it seem that it"ll require finer way to set overhead on containers, did you mean selector for containers?

FYI: I suggested something similar in this comment.
Note that the container name should be validated and allow refering to the binding sidecar and compute containers only.

If specified, KubeVirt will add the resource overhead specified in `computeResourceOverhead` to the compute container of the virt-launcher pod.
The specified resource overhead will be added to the compute container per unique usage of the plugin (not per NIC using the binding).

For example, assuming there is a plugin registered with a 200Mi memory overhead for the `compute` container, and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does passt require mempry overhead per interface or based on whether its being executed or not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is enough for the interface to be defined for the passt binary to run and consume the memory specified.

It is not clear what you mean by executed.

sidecarImage: quay.io/kubevirt/mynetbindingplugin
sidecarResources:
requests:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mere yaml example it can be anything 🙂

- Exposes the VM owner to unnecessary implementation details

2. Mutating webhook
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it another webook in addition to one that address the sidecar resources one?

It seem that it lacks description about what it"ll do, pros, cons section and how its going to integrate with kubevirt? ( I recall something about adding some label to launcher pods that use network binding plugins)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added additional details, same as for the sidecar.

binding:
mynetbindingplugin:
sidecarImage: quay.io/kubevirt/mynetbindingplugin
sidecarResources:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should be explicit about the supported API.
We should not specify the cpu but we can add a not that the structure is the native k8s one, allowing extending the support to the other fields (cpu, custom ones).

The definition of the API should not be limited to an example.
But this is not blocker for me.

Adding the validation section can also be useful. We could specify there that we ignore any data except the memory at this stage. We should be explicit and say that we do not fail if other data is added. Again, this is not a blocker and I am good with letting this stay gray for now.

If specified, KubeVirt will add the resource overhead specified in `computeResourceOverhead` to the compute container of the virt-launcher pod.
The specified resource overhead will be added to the compute container per unique usage of the plugin (not per NIC using the binding).

For example, assuming there is a plugin registered with a 200Mi memory overhead for the `compute` container, and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is enough for the interface to be defined for the passt binary to run and consume the memory specified.

It is not clear what you mean by executed.


For example, assuming there is a plugin registered with a 200Mi memory overhead for the `compute` container, and
there are two interfaces using it.
`virt-controller` will only add 200Mi of memory to the `compute` container.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this 200Mi coming from? In the example above you used 20Mi.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a typo.
Fixed it in the example.

The requirement to maintain certificates for the webhook could be mitigated using tools such as [cert-manager](https://cert-manager.io/).

# Appendix H: Alternatives to compute container resource overhead specification
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the spesifed resources apply for all containers in the selected pod?
Resources are configured on container types, it seem that it"ll require finer way to set overhead on containers, did you mean selector for containers?

I'm sure author of the proposal will handle this detail.

FYI: I suggested something similar in this comment.
Note that the container name should be validated and allow refering to the binding sidecar and compute containers only.

The alternative here is generic, not specific to a binding plugin and requires other steps for it to work.
I think we can get into the details of what it needs to validate or not as part of a different proposal that will focus on this alternative. It should be enough to mention here the general idea, so it will be visible that we discussed it.

@orelmisan orelmisan force-pushed the net-binding-plugin-mem-overhead branch from 848082d to d4eb851 Compare July 7, 2024 11:57
@orelmisan orelmisan force-pushed the net-binding-plugin-mem-overhead branch from d4eb851 to c33498b Compare July 9, 2024 15:46
@orelmisan orelmisan changed the title Network binding plugin: Support resource overhead Network binding plugin: Support compute resource overhead Jul 9, 2024
@orelmisan orelmisan changed the title Network binding plugin: Support compute resource overhead Network binding plugin: Support compute container resource overhead Jul 9, 2024
@orelmisan
Copy link
Member Author

Change: Split the sidecar's resource specification to #309

@orelmisan orelmisan force-pushed the net-binding-plugin-mem-overhead branch from c33498b to 591c3a4 Compare July 9, 2024 15:49
Copy link
Member

@EdDev EdDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 10, 2024
##### Compute Container Resource Overhead

For some plugins, an additional resource consumption can be expected from the virt-launcher pod compute container.
For example, there could be need to execute an additional binary in the compute container.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain what binary can be executed and how it gets there?
How can an external plugin rely on the fact that a specific binary is part of the virt-launcher image?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example was derived from the passt use case, on which the passt binary is shipped as part of the virt-launcher compute image.

Another example could be that the plugin configures the domain in a way that causes the virt-stack to consume additional compute resources that KubeVirt cannot account for.


For some plugins, an additional resource consumption can be expected from the virt-launcher pod compute container.
For example, there could be need to execute an additional binary in the compute container.
Since this binary has its own CPU and memory limits, they should be somehow accounted for.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we should speak of a specific functionality and not about arbitrary binaries.
virt-launcher should be well defined.
If we know what functionality we're enabling that requires additional overhead, kubevirt needs to know the resources it requires. Similar to other aspects accounted for in GetMemoryOverhead.

I would say that if we want an API for this it should be based on a functionality, not a binding.

Each resource consumption of it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plugin can perform actions that affect KubeVirt in ways it cannot account for.
Thus the need to externally add resources to the compute container.

@orelmisan
Copy link
Member Author

Thank you for reviewing this proposal @vladikr, sorry it took some time to respond.

@aburdenthehand
Copy link
Member

On behalf of SIG-compute, can one of @jean-edouard @stu-gott @enp0s3 @vladikr please re-review to move this forward. I can merge if I receive an approve or second lgtm from one of you. Thank you.

@kubevirt-bot
Copy link

Pull requests that are marked with lgtm should receive a review
from an approver within 1 week.

After that period the bot marks them with the label needs-approver-review.

/label needs-approver-review

@kubevirt-bot kubevirt-bot added the needs-approver-review Indicates that a PR requires a review from an approver. label Sep 30, 2024
@enp0s3
Copy link
Contributor

enp0s3 commented Dec 11, 2024

/cc

@kubevirt-bot kubevirt-bot requested a review from enp0s3 December 11, 2024 12:42
@enp0s3
Copy link
Contributor

enp0s3 commented Dec 16, 2024

Deferring the discussion for after the end year holidays

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. needs-approver-review Indicates that a PR requires a review from an approver. sig/compute size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants