Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support specifying nodeSelector and affinity values in SmbCommonConfig to control pod scheduling #291

Merged
merged 14 commits into from
Apr 3, 2023

Conversation

phlogistonjohn
Copy link
Collaborator

Fixes: #283

The new podSettings key allows for control of certain parameters the operator can not or should not guess at. Currently, this allows one to control scheduling via nodeSelector and affinity sections under podSettings in SmbCommonConfig.

It's located in SmbCommonConfig as SmbShare is really supposed to be about the share and less about the server. It's similar to how configuration of network integration is done by SmbCommonConfig. It's certainly not for SmbSecurityConfig :-)

I also added some extra labels to the smb pods so that they can be quickly backtracked to the smbcommonconfig and SmbSecurityConfig that were used to generate them. These labels helped my write the test cases.

@phlogistonjohn
Copy link
Collaborator Author

Right. These tests probably don't make any sense on a single node setup. I need to think about how to handle that case.

@phlogistonjohn
Copy link
Collaborator Author

Erm.

            --- PASS: TestIntegration/groupedShares/clustered/TestPodsReady (0.02s)
    --- PASS: TestIntegration/scheduling (51.03s)
        --- PASS: TestIntegration/scheduling/NodeSelectorSuite (24.09s)
            --- PASS: TestIntegration/scheduling/NodeSelectorSuite/TestPodsRunOnLabeledNode (23.93s)
        --- PASS: TestIntegration/scheduling/AffinityBasedSelectorSuite (26.94s)
            --- PASS: TestIntegration/scheduling/AffinityBasedSelectorSuite/TestPodsRunOnLabeledNode (26.75s)
PASS
ok  	github.com/samba-in-kubernetes/samba-operator/tests/integration	1607.647s
yq not found in PATH, checking /root/samba-operator/.bin
controller-gen not found in PATH, checking /root/samba-operator/.bin
/root/samba-operator/.bin/controller-gen "crd:trivialVersions=true,crdVersions=v1" rbac:roleName=manager-role webhook \
	paths="./..." output:crd:artifacts:config=config/crd/bases
YQ=/root/samba-operator/.bin/yq /root/samba-operator/hack/yq-fixup-yamls.sh  /root/samba-operator/config
kustomize not found in PATH, checking /root/samba-operator/.bin
/root/samba-operator/.bin/kustomize build config/default | kubectl delete -f -
namespace "samba-operator-system" deleted
customresourcedefinition.apiextensions.k8s.io "smbcommonconfigs.samba-operator.samba.org" deleted
customresourcedefinition.apiextensions.k8s.io "smbsecurityconfigs.samba-operator.samba.org" deleted
customresourcedefinition.apiextensions.k8s.io "smbshares.samba-operator.samba.org" deleted
role.rbac.authorization.k8s.io "samba-operator-leader-election-role" deleted
clusterrole.rbac.authorization.k8s.io "samba-operator-manager-role" deleted
clusterrole.rbac.authorization.k8s.io "samba-operator-metrics-reader" deleted
clusterrole.rbac.authorization.k8s.io "samba-operator-proxy-role" deleted
rolebinding.rbac.authorization.k8s.io "samba-operator-leader-election-rolebinding" deleted
clusterrolebinding.rbac.authorization.k8s.io "samba-operator-manager-rolebinding" deleted
clusterrolebinding.rbac.authorization.k8s.io "samba-operator-proxy-rolebinding" deleted
configmap "samba-operator-controller-cfg" deleted
service "samba-operator-controller-manager-metrics-service" deleted
deployment.apps "samba-operator-controller-manager" deleted
* Deleting "minikube" in kvm2 ...
* Deleting "minikube-m02" in kvm2 ...
* Deleting "minikube-m03" in kvm2 ...
* Removed all traces of the "minikube" cluster.
time="2023-03-02T22:35:46Z" level=fatal msg="Unable to delete registry-samba.apps.ocp.cloud.ci.centos.org/sink/samba-operator:ci-k8s-1.26-pr291. Image may not exist or is not stored with a v2 Schema in a v2 registry"

Our tests passed, but the minikube delete command must have failed?

I'm marking this as ready for review and kicking off another ci run but if this keeps happening I'll ask reviewers to focus on the actual Go tests status rather than the overall ci state.

@phlogistonjohn phlogistonjohn marked this pull request as ready for review March 2, 2023 22:48
@phlogistonjohn
Copy link
Collaborator Author

/test centos-ci/sink-clustered/mini-k8s-1.26

@anoopcs9
Copy link
Collaborator

anoopcs9 commented Mar 6, 2023

Erm.

* Deleting "minikube" in kvm2 ...
* Deleting "minikube-m02" in kvm2 ...
* Deleting "minikube-m03" in kvm2 ...
* Removed all traces of the "minikube" cluster.
time="2023-03-02T22:35:46Z" level=fatal msg="Unable to delete registry-samba.apps.ocp.cloud.ci.centos.org/sink/samba-operator:ci-k8s-1.26-pr291. Image may not exist or is not stored with a v2 Schema in a v2 registry"

Our tests passed, but the minikube delete command must have failed?

I think its the following skopeo command from job script that failed in the above scenario:

skopeo delete "docker://${CI_IMG_OP}"

I can see that the job was triggered within a span of 20 minutes. Image tagging is differentiated by just PR numbers. 1st run completed successfully and the 2nd run fails to find the image for deletion afterwards.

@phlogistonjohn
Copy link
Collaborator Author

@Mergifyio rebase

@mergify
Copy link

mergify bot commented Mar 27, 2023

rebase

✅ Branch has been successfully rebased

Copy link
Collaborator

@anoopcs9 anoopcs9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below for a typo, remark and clarification.

docs/resources/SmbCommonConfig.md Outdated Show resolved Hide resolved
internal/resources/statefulsets.go Show resolved Hide resolved
tests/integration/scheduling_test.go Show resolved Hide resolved
anoopcs9
anoopcs9 previously approved these changes Mar 31, 2023
Copy link
Collaborator

@anoopcs9 anoopcs9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, cool work.

@phlogistonjohn phlogistonjohn added the priority-review This PR deserves a look label Mar 31, 2023
@phlogistonjohn
Copy link
Collaborator Author

@Mergifyio rebase

Add labels for "samba-operator.samba.org/common-config-from" and
"samba-operator.samba.org/security-config-from" when the server instance
is derived from a share using common config or security config CRs. The
values of the labels map back to the names of the resources.
This can be handy for quickly seeing what resources came from where and
when writing tests that need to associate a pod with a CR resource
beyond just the SmbShare name.

Signed-off-by: John Mulligan <[email protected]>
A node selector value found in the common config takes priority over
that of the operator config or hard coded default.

Signed-off-by: John Mulligan <[email protected]>
This will allow users & admins to customize pod affinity settings
for the local cluster's needs.

Signed-off-by: John Mulligan <[email protected]>
This patch includes the logic to add the anti-affinity rule that you
typically want for ctdb clustered smbds at the end of the rules
pulled from the common config.

Signed-off-by: John Mulligan <[email protected]>
If not every test run will always use the same pattern, possibly
defeating the purpose of using random choice in the first place.

Signed-off-by: John Mulligan <[email protected]>
Add some tests that check that node selector values and/or affinity
rules are passed on from a SmbCommonConfig to the pods that are created.

Signed-off-by: John Mulligan <[email protected]>
The new podSettings key allows for control of certain parameters the
operator can not guess at. Currently, this allows one to control
scheduling via nodeSelector and affinity sections.

Signed-off-by: John Mulligan <[email protected]>
Add yet another environment variable to inform the test suite about the
outer world. The variable SMBOP_TEST_MIN_NODE_COUNT should be set to the
number of nodes that typical pods may be scheduled on, the tests may
then use this value to determine if the cluster is in-spec or
out-of-spec for tests that require a certain number of nodes.

Signed-off-by: John Mulligan <[email protected]>
The tests for the node selector and affinity scheduling require at least
two nodes. Previously, the tests would simply fail if there weren't
enough nodes to run the test. This patch adds a check such that the test
is skipped if there aren't enough nodes - unless the
SMBOP_TEST_MIN_NODE_COUNT was specified. This variable acts as a double
check on the expected environment. If the number of available nodes is
less than that given by the env var the test is required to fail.

Signed-off-by: John Mulligan <[email protected]>
In the centosci based testing environment we have an expected number of
worker nodes we can pass to the test suite. This can then be used by the
suite to determine if the test cluster has been set up according to
intended spec.

Signed-off-by: John Mulligan <[email protected]>
@mergify
Copy link

mergify bot commented Apr 2, 2023

rebase

✅ Branch has been successfully rebased

@phlogistonjohn
Copy link
Collaborator Author

WHY did it dismiss the existing review. There were no conflicts and the rebase was done via mergify.
Errgh. @anoopcs9 can you please take another look? Thanks!

@mergify mergify bot merged commit f9b9853 into samba-in-kubernetes:master Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority-review This PR deserves a look
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to configure node selector if using a mixed K8S
2 participants