-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add topologySpreadConstraints #2091
Add topologySpreadConstraints #2091
Conversation
* Update docs Signed-off-by: Yi Chen <[email protected]> * Remove docs and update README Signed-off-by: Yi Chen <[email protected]> * Add link to monthly community meeting Signed-off-by: Yi Chen <[email protected]> --------- Signed-off-by: Yi Chen <[email protected]> Signed-off-by: jbhalodia-slack <[email protected]>
* Add PodDisruptionBudget to chart Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> * PR comments Signed-off-by: Carlos Sánchez Páez <[email protected]> --------- Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: jbhalodia-slack <[email protected]>
Signed-off-by: jbhalodia-slack <[email protected]>
Signed-off-by: jbhalodia-slack <[email protected]>
0728f55
to
e119dcd
Compare
Signed-off-by: jbhalodia-slack <[email protected]>
Signed-off-by: jbhalodia-slack <[email protected]>
dc1427c
to
2c4b7d2
Compare
Signed-off-by: jbhalodia-slack <[email protected]>
0c0ba32
to
00a26df
Compare
@@ -17,21 +17,22 @@ tests: | |||
|
|||
- it: Should render spark operator podDisruptionBudget if podDisruptionBudget.enable is true | |||
set: | |||
replicaCount: 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PDB tests were failing on the master branch so these are fixes to get them to pass.
Hi @vara-bonthu @andreyvelich @ChenYi015 @yuchaoran2011, could you please review this PR? 🙇♂️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @jbhalodia-slack
/approve
@yuchaoran2011 @ChenYi015 Please review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ChenYi015, vara-bonthu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Update README and documentation (kubeflow#2047) * Update docs Signed-off-by: Yi Chen <[email protected]> * Remove docs and update README Signed-off-by: Yi Chen <[email protected]> * Add link to monthly community meeting Signed-off-by: Yi Chen <[email protected]> --------- Signed-off-by: Yi Chen <[email protected]> Signed-off-by: jbhalodia-slack <[email protected]> * Add PodDisruptionBudget to chart (kubeflow#2078) * Add PodDisruptionBudget to chart Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> * PR comments Signed-off-by: Carlos Sánchez Páez <[email protected]> --------- Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: jbhalodia-slack <[email protected]> * Set topologySpreadConstraints Signed-off-by: jbhalodia-slack <[email protected]> * Update README and increase patch version Signed-off-by: jbhalodia-slack <[email protected]> * Revert replicaCount change Signed-off-by: jbhalodia-slack <[email protected]> * Update README after master merger Signed-off-by: jbhalodia-slack <[email protected]> * Update README Signed-off-by: jbhalodia-slack <[email protected]> --------- Signed-off-by: Yi Chen <[email protected]> Signed-off-by: jbhalodia-slack <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> Co-authored-by: Yi Chen <[email protected]> Co-authored-by: Carlos Sánchez Páez <[email protected]> (cherry picked from commit 4108f54)
* Update helm docs (#2081) Signed-off-by: Carlos Sánchez Páez <[email protected]> (cherry picked from commit eca3fc8) * Update the process to build api-docs, generate CRD manifests and code (#2046) * Update .gitignore Signed-off-by: Yi Chen <[email protected]> * Update .dockerignore Signed-off-by: Yi Chen <[email protected]> * Update Makefile Signed-off-by: Yi Chen <[email protected]> * Update the process to generate api docs Signed-off-by: Yi Chen <[email protected]> * Update the workflow to generate api docs Signed-off-by: Yi Chen <[email protected]> * Use controller-gen to generate CRD and deep copy related methods Signed-off-by: Yi Chen <[email protected]> * Update helm chart CRDs Signed-off-by: Yi Chen <[email protected]> * Update workflow for building spark operator Signed-off-by: Yi Chen <[email protected]> * Update README.md Signed-off-by: Yi Chen <[email protected]> --------- Signed-off-by: Yi Chen <[email protected]> (cherry picked from commit 779ea3d) * Add topologySpreadConstraints (#2091) * Update README and documentation (#2047) * Update docs Signed-off-by: Yi Chen <[email protected]> * Remove docs and update README Signed-off-by: Yi Chen <[email protected]> * Add link to monthly community meeting Signed-off-by: Yi Chen <[email protected]> --------- Signed-off-by: Yi Chen <[email protected]> Signed-off-by: jbhalodia-slack <[email protected]> * Add PodDisruptionBudget to chart (#2078) * Add PodDisruptionBudget to chart Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> * PR comments Signed-off-by: Carlos Sánchez Páez <[email protected]> --------- Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: jbhalodia-slack <[email protected]> * Set topologySpreadConstraints Signed-off-by: jbhalodia-slack <[email protected]> * Update README and increase patch version Signed-off-by: jbhalodia-slack <[email protected]> * Revert replicaCount change Signed-off-by: jbhalodia-slack <[email protected]> * Update README after master merger Signed-off-by: jbhalodia-slack <[email protected]> * Update README Signed-off-by: jbhalodia-slack <[email protected]> --------- Signed-off-by: Yi Chen <[email protected]> Signed-off-by: jbhalodia-slack <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> Co-authored-by: Yi Chen <[email protected]> Co-authored-by: Carlos Sánchez Páez <[email protected]> (cherry picked from commit 4108f54) * Use controller-runtime to reconsturct spark operator (#2072) * Use controller-runtime to reconstruct spark operator Signed-off-by: Yi Chen <[email protected]> * Update helm charts Signed-off-by: Yi Chen <[email protected]> * Update examples Signed-off-by: Yi Chen <[email protected]> --------- Signed-off-by: Yi Chen <[email protected]> (cherry picked from commit 0dc641b) --------- Co-authored-by: Carlos Sánchez Páez <[email protected]> Co-authored-by: jbhalodia-slack <[email protected]>
* Update README and documentation (kubeflow#2047) * Update docs Signed-off-by: Yi Chen <[email protected]> * Remove docs and update README Signed-off-by: Yi Chen <[email protected]> * Add link to monthly community meeting Signed-off-by: Yi Chen <[email protected]> --------- Signed-off-by: Yi Chen <[email protected]> Signed-off-by: jbhalodia-slack <[email protected]> * Add PodDisruptionBudget to chart (kubeflow#2078) * Add PodDisruptionBudget to chart Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> * PR comments Signed-off-by: Carlos Sánchez Páez <[email protected]> --------- Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: jbhalodia-slack <[email protected]> * Set topologySpreadConstraints Signed-off-by: jbhalodia-slack <[email protected]> * Update README and increase patch version Signed-off-by: jbhalodia-slack <[email protected]> * Revert replicaCount change Signed-off-by: jbhalodia-slack <[email protected]> * Update README after master merger Signed-off-by: jbhalodia-slack <[email protected]> * Update README Signed-off-by: jbhalodia-slack <[email protected]> --------- Signed-off-by: Yi Chen <[email protected]> Signed-off-by: jbhalodia-slack <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> Co-authored-by: Yi Chen <[email protected]> Co-authored-by: Carlos Sánchez Páez <[email protected]>
* Update README and documentation (kubeflow#2047) * Update docs Signed-off-by: Yi Chen <[email protected]> * Remove docs and update README Signed-off-by: Yi Chen <[email protected]> * Add link to monthly community meeting Signed-off-by: Yi Chen <[email protected]> --------- Signed-off-by: Yi Chen <[email protected]> Signed-off-by: jbhalodia-slack <[email protected]> * Add PodDisruptionBudget to chart (kubeflow#2078) * Add PodDisruptionBudget to chart Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> * PR comments Signed-off-by: Carlos Sánchez Páez <[email protected]> --------- Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: jbhalodia-slack <[email protected]> * Set topologySpreadConstraints Signed-off-by: jbhalodia-slack <[email protected]> * Update README and increase patch version Signed-off-by: jbhalodia-slack <[email protected]> * Revert replicaCount change Signed-off-by: jbhalodia-slack <[email protected]> * Update README after master merger Signed-off-by: jbhalodia-slack <[email protected]> * Update README Signed-off-by: jbhalodia-slack <[email protected]> --------- Signed-off-by: Yi Chen <[email protected]> Signed-off-by: jbhalodia-slack <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> Co-authored-by: Yi Chen <[email protected]> Co-authored-by: Carlos Sánchez Páez <[email protected]>
…ubeflow#2108) * Update helm docs (kubeflow#2081) Signed-off-by: Carlos Sánchez Páez <[email protected]> (cherry picked from commit eca3fc8) * Update the process to build api-docs, generate CRD manifests and code (kubeflow#2046) * Update .gitignore Signed-off-by: Yi Chen <[email protected]> * Update .dockerignore Signed-off-by: Yi Chen <[email protected]> * Update Makefile Signed-off-by: Yi Chen <[email protected]> * Update the process to generate api docs Signed-off-by: Yi Chen <[email protected]> * Update the workflow to generate api docs Signed-off-by: Yi Chen <[email protected]> * Use controller-gen to generate CRD and deep copy related methods Signed-off-by: Yi Chen <[email protected]> * Update helm chart CRDs Signed-off-by: Yi Chen <[email protected]> * Update workflow for building spark operator Signed-off-by: Yi Chen <[email protected]> * Update README.md Signed-off-by: Yi Chen <[email protected]> --------- Signed-off-by: Yi Chen <[email protected]> (cherry picked from commit 779ea3d) * Add topologySpreadConstraints (kubeflow#2091) * Update README and documentation (kubeflow#2047) * Update docs Signed-off-by: Yi Chen <[email protected]> * Remove docs and update README Signed-off-by: Yi Chen <[email protected]> * Add link to monthly community meeting Signed-off-by: Yi Chen <[email protected]> --------- Signed-off-by: Yi Chen <[email protected]> Signed-off-by: jbhalodia-slack <[email protected]> * Add PodDisruptionBudget to chart (kubeflow#2078) * Add PodDisruptionBudget to chart Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> * PR comments Signed-off-by: Carlos Sánchez Páez <[email protected]> --------- Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: jbhalodia-slack <[email protected]> * Set topologySpreadConstraints Signed-off-by: jbhalodia-slack <[email protected]> * Update README and increase patch version Signed-off-by: jbhalodia-slack <[email protected]> * Revert replicaCount change Signed-off-by: jbhalodia-slack <[email protected]> * Update README after master merger Signed-off-by: jbhalodia-slack <[email protected]> * Update README Signed-off-by: jbhalodia-slack <[email protected]> --------- Signed-off-by: Yi Chen <[email protected]> Signed-off-by: jbhalodia-slack <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> Signed-off-by: Carlos Sánchez Páez <[email protected]> Co-authored-by: Yi Chen <[email protected]> Co-authored-by: Carlos Sánchez Páez <[email protected]> (cherry picked from commit 4108f54) * Use controller-runtime to reconsturct spark operator (kubeflow#2072) * Use controller-runtime to reconstruct spark operator Signed-off-by: Yi Chen <[email protected]> * Update helm charts Signed-off-by: Yi Chen <[email protected]> * Update examples Signed-off-by: Yi Chen <[email protected]> --------- Signed-off-by: Yi Chen <[email protected]> (cherry picked from commit 0dc641b) --------- Co-authored-by: Carlos Sánchez Páez <[email protected]> Co-authored-by: jbhalodia-slack <[email protected]>
Purpose of this PR
Its good to spread the Spark Operator pods across the cluster among failure-domains such as regions, zones, nodes, and other user-defined topology domains. This can help to achieve high availability as well as efficient resource utilization.
Proposed changes:
Change Category
Indicate the type of change by marking the applicable boxes:
Rationale
Production workloads should use enable topologySpreadConstraints to make sure their workloads are running in HA and are resilient to node or AZ specific failures.
Checklist
Before submitting your PR, please review the following:
Additional Notes
Github Issue: #2086
Slack Thread: https://cloud-native.slack.com/archives/C074588U7EG/p1721240818494049