Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Registry pod affinity #1039

Merged
merged 30 commits into from
Dec 7, 2022
Merged

Registry pod affinity #1039

merged 30 commits into from
Dec 7, 2022

Conversation

jeff-mccoy
Copy link
Contributor

@jeff-mccoy jeff-mccoy commented Nov 25, 2022

This PR changes the behavior of the internal Zarf Registry to behave correctly on mulit-node clusters with storage (a change in v0.22.x made this unstable). Change list:

  • Use Pod Affinity to force pods onto the same node for proper upgrades/restarts
  • Add K8s HPA to increase performance of image pushes
  • Add Priority Classes to the Zarf Agent and Zarf Registry deployments
  • Add Pod Disruption Budget for the Zarf Registry
  • Add Zarf variables to support custom registry resources requests/limits, custom PVC size, and alternatively the use of a custom PVC
  • Move the Zarf Registry chart to a local chart
  • Fix a small issue with log files not being properly saved for K3s workflows
  • Fix issue from Make Zarf a Library #934 with cross-platform commands
  • Remove makefile config overrides and replace with zarf-config.toml

Fixes #1004
Related to #1062

@jeff-mccoy
Copy link
Contributor Author

Screenshot 2022-11-30 at 10 55 18 PM

Screenshot 2022-11-30 at 10 55 56 PM

@jeff-mccoy
Copy link
Contributor Author

Sad times, the daemonset will only be useful on clusters with at least 2 nodes per AZ and even then can be a bit janky, so dropping that part of this. Did verify if a node completely dies you can recover by re-running zarf init, but it may require deleting the volumeattachment as it is supppper slow to update otherwise. Did confirm this retained the existing images, even if the registry deployment was completely deleted.

Screenshot 2022-12-01 at 3 39 47 AM

Screenshot 2022-12-01 at 3 53 20 AM

Screenshot 2022-12-01 at 3 41 22 AM

Screenshot 2022-12-01 at 3 56 18 AM

Screenshot 2022-12-01 at 4 00 38 AM

Notes on the volumeattachment step (can be done via zarf tools monitor: https://www.ibm.com/docs/en/stg-block-csi-driver/1.8.0?topic=troubleshooting-recovering-pod-volume-attachment-from-crashed-kubernetes-node.

@jeff-mccoy jeff-mccoy marked this pull request as ready for review December 6, 2022 23:30
@jeff-mccoy jeff-mccoy force-pushed the registry-pod-affinity branch from ef2f0d0 to 3c9cc74 Compare December 7, 2022 00:04
Copy link
Contributor

@Racer159 Racer159 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally and looks good to me (despite the notes)

@jeff-mccoy
Copy link
Contributor Author

Thanks for the review @Racer159 I'm going to keep this one open for comment tomorrow since I posted in K8s Slack asking for K8s feedback.

@jeff-mccoy jeff-mccoy merged commit b8ee9da into main Dec 7, 2022
@jeff-mccoy jeff-mccoy deleted the registry-pod-affinity branch December 7, 2022 21:08
jeff-mccoy added a commit that referenced this pull request Dec 8, 2022
Removes a test that was previously needed to make sure a package in a
default path was read. Now that #1039 is merged, we have this covered on
init package create / deploy + it was was trampling over that file
during this test.
@jeff-mccoy jeff-mccoy mentioned this pull request Jan 12, 2023
Noxsios pushed a commit that referenced this pull request Mar 8, 2023
This PR changes the behavior of the internal Zarf Registry to behave
correctly on mulit-node clusters with storage (a change in v0.22.x made
this unstable). Change list:

- Use [Pod
Affinity](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#an-example-of-a-pod-that-uses-pod-affinity)
to force pods onto the same node for proper upgrades/restarts
- Add K8s
[HPA](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/)
to increase performance of image pushes
- Add [Priority
Classes](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass)
to the Zarf Agent and Zarf Registry deployments
- Add [Pod Disruption
Budget](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/#pod-disruption-budgets)
for the Zarf Registry
- Add Zarf variables to support custom registry resources
requests/limits, custom PVC size, and alternatively the use of a custom
PVC
- Move the Zarf Registry chart to a local chart
- Fix a small issue with log files not being properly saved for K3s
workflows
- Fix issue from #934 with cross-platform commands
- Remove makefile config overrides and replace with `zarf-config.toml`


Fixes #1004 
Related to #1062
Noxsios pushed a commit that referenced this pull request Mar 8, 2023
Removes a test that was previously needed to make sure a package in a
default path was read. Now that #1039 is merged, we have this covered on
init package create / deploy + it was was trampling over that file
during this test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

zarf init multi-attach error for registry pods
3 participants