-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster Autoscaler conflict with volumneClaim and or affinity-assistant #4699
Comments
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
/remove-lifecycle stale As issue still persists |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
Stale issues rot after 30d of inactivity. /lifecycle rotten Send feedback to tektoncd/plumbing. |
Did you consider disabling the affinity assistant? |
Hello all, this is indeed a challenge. How does anybody use cluster autoscaler with Tekton successfully? Is everybody just statically provisioning nodes and burning money this way? I would love to see some How-To on how you can setup Cluster Autoscaler with Tekton (with some kind of volumeClaim)... |
@icereed My company uses cluster autoscaler, with an NFS server(in the k8s cluster) to serve NFS mounts for PVCs. We also disable affinity-assistant. |
I have the same issue with this. The autoscaling is ok for some jobs which have the same node select label, but the jobs with the same label are not successfully running when resources are not enough. update: |
@grid-dev I'm not sure if this addresses your use case, but we've recently introduced some new options for the affinity assistant and would appreciate your feedback! Please feel free to weigh in on #6990. Since you're using a cluster autoscaler w/ a limited number of pods per node I wonder if the "isolate-pipelineruns" option would work well for you? https://github.com/tektoncd/pipeline/blob/main/docs/affinityassistants.md#affinity-assistants |
Not enough "slots" for pods when affinity assistants allocates together with Cluster Autoscaler
Expected Behavior
EKS cluster exists and has the following setup
Cluster nodes are packed and have between 12 and 17 pods – whereas 17 is the maximum for this instance type
PipelineRun is started consisting of 2 tasks which both share a workspace aka a volumneClaim (see "Pipeline YAML code")
affinity-assistant-...
allocates needed pods including itself on a single node or at least region so volumneClaim can be shared.If there is not enough space left for the needed "pods" the Cluster Autoscaler provisions a new node
All task's start and can bind to the volumneClaim - one after the other.
Pipeline finished successfully
If Cluster Autoscaler created a new node, this node is terminated again after the run was successfull.
Actual Behavior
Steps to Reproduce the Problem
Step 1 - 4 are the same as in "Actual Behavior"
affinity-assistant-3a0bc57d00-0 pod
is started and persistentVolumneClaim is bound, but the pod for the first taskgo-lang-8txd7-git-pod
is stuck in (see "Pod stuck event log")Additional Info
The text was updated successfully, but these errors were encountered: