Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draft: add namespace scoped operator mode #1867

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

NymanRobin
Copy link
Member

@NymanRobin NymanRobin commented Aug 9, 2024

Note: This is still a very rough draft to promote some discussion on the topic, DO NOT MERGE!
This PR has currently generated namespaces for all tests, this is for demonstration purpose, this would of course not be the case when finalizing.

The current implementation in this PR gives BMO (Bare Metal Operator) access to resources within a one or more pre-defined namespace. There is a limitation that if for whatever reason the admin of BMO would like to add another namespace this is not possible without restarting the operator as the watchNamespaces cannot be dynamically changed to my knowledge.

  • Agree on scope of the implementation. (To restrict BMO to one or multiple namespaces)
  • Add a test framework config for restricted operator scope.
  • Implement the ability to provide multiple pre-defined namespaces.
  • Update and clean up the documentation once the implementation details are finalized.
  • Replace the namespace set in rbac markers with "" and generate the correct config!
  • Create the go code for setting namespaces before using kustomize
  • Do proper integration with the deploy.sh (or deploy-cli?) in the current state there is try to add the namespace to the temp_overlay but it is not working (scaling up the operator is the problem most likely config, the replicaset should show more details):

What this PR does / why we need it:

This PR adds automation and docs for restricting the operator scope from cluster wide to namespace restricted.
In case cluster wide scope is not needed it is better to restrict the permission through rbacs to one or more predefined namespaces.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1261

@metal3-io-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign honza for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@NymanRobin
Copy link
Member Author

/hold

@metal3-io-bot metal3-io-bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 9, 2024
@NymanRobin
Copy link
Member Author

I chose to ignore the Kubebuilder markers for the time being, as it was easier to accomplish this with pure Kustomize when handling either ClusterRole or Role.

If we will be doing a combination, such as restricting only secrets behind the Role, then I think it makes sense to use the Kubebuilder markers to generate both the Role and ClusterRole, and simply add the new RoleBinding in the overlay.

@tuminoid
Copy link
Member

@tuminoid
Copy link
Member

Thanks for a demo today! Please update the PR to the latest, update the description per the discussions, and also add the identified obstacles and future improvements.

It would also be helpful if you could make a test PR that would configure a single namespace (default config is fine) by default, and thus would enable testing this via dev-env and CI, it might uncover some edge cases elsewhere that break.

@metal3-io-bot metal3-io-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 21, 2024
@NymanRobin
Copy link
Member Author

Sure @tuminoid, I updated now the PR to showcase the new method of using kustomize custom plugins to generate roles and rolebindings for multiple namespaces. However, this still requires cleaning especially in documentation and test-framework side.

Good idea to open the test PR for dev-env, do we know into which namespace the BMH will go there? So I know what config to generate for the testing

@NymanRobin
Copy link
Member Author

To answer myself the namespace used in dev-env for BMH is metal3, based on BMO logs from previous tests:

{"level":"info","ts":1722317271.962581,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":{"name":"node-0","namespace":"metal3"}}

@tuminoid
Copy link
Member

To answer myself the namespace used in dev-env for BMH is metal3, based on BMO logs from previous tests:

{"level":"info","ts":1722317271.962581,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":{"name":"node-0","namespace":"metal3"}}

Yes, that is correct. metal3 namespace is used in tests, and more speicfically we have node-0 and node-1 BMHs.

@NymanRobin NymanRobin force-pushed the namespaced-rbac-mode branch 2 times, most recently from 0ee4d01 to b219193 Compare August 22, 2024 09:31
@metal3-io-bot metal3-io-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 22, 2024
This adds automation and docs for restricting the
operator scope from cluster wide to namespace restricted.

Signed-off-by: NymanRobin <[email protected]>
@NymanRobin
Copy link
Member Author

/test metal3-dev-env-integration-test-ubuntu-main

@mquhuy
Copy link
Member

mquhuy commented Aug 23, 2024

Just comment this here, since we will need to fix e2e anyway before merging this PR: I realized that without being able to reload the namespace names in-place, we face a dilemma situation in E2E, as in E2E each testcase runs on its own namespace.

Perhaps the best solution would be to decide on the list of namespaces before the suite starts, and install BMO with that list. Otherwise we have to use fixed namespace(s) for the test (which is not nice for logging purposes), or not test the namespace scoped at all (not nice either).

@NymanRobin
Copy link
Member Author

Hi @mquhuy,

I replaced the random string with test as a first measure, meaning the amount of namespaces will still be the same but fixed. However, if your concern is that this might be problematic, could you please clarify the advantages of using random strings and generating namespaces in advance, rather than using fixed namespaces, I am interested in this 😄

Nevertheless, implementing this change should be relatively straightforward. It would involve rewriting the set_ns.py script as Go code (which I guess should be done anyway) to integrate it into the test suite. We could then use the old method of generating random strings before the BMO-kustomize step. However, it’s important to ensure that namespaces always exist when BMO is running, or it might run into issues. 😄

Currently, the biggest problem I see with e2e testing is that BMHs were expected to be deleted when their namespace was deleted. Since this isn’t happening, we’re encountering MAC address clashes between the BMHs, which is resulting in failures.

@mquhuy
Copy link
Member

mquhuy commented Aug 23, 2024

I replaced the random string with test as a first measure, meaning the amount of namespaces will still be the same but fixed. However, if your concern is that this might be problematic, could you please clarify the advantages of using random strings and generating namespaces in advance, rather than using fixed namespaces, I am interested in this 😄

There shouldn't be any needs for random names as of now, I guess, but for e.g. if we run a suite in which 1 test is ran twice, for some reason, then the logs from the second one would override the first one xD. Anyway having a random part in test ns is a common practice for CAPI-related projects, so I guess it's still nice to have.

Nevertheless, implementing this change should be relatively straightforward. It would involve rewriting the set_ns.py script as Go code (which I guess should be done anyway) to integrate it into the test suite. We could then use the old method of generating random strings before the BMO-kustomize step. However, it’s important to ensure that namespaces always exist when BMO is running, or it might run into issues. 😄

Yes, what I was mentioning is related to e2e only, currently the test decides the ns name and create it right before a testcase, at which point BMO has already been installed and running. The solution might be just to decide and create all the ns before the suite starts, like I mentioned. No big issue on this, I'm just mentioning it here so that it won't be forgotten 😃

Currently, the biggest problem I see with e2e testing is that BMHs were expected to be deleted when their namespace was deleted. Since this isn’t happening, we’re encountering MAC address clashes between the BMHs, which is resulting in failures.

Sorry I'm not following, so with scoped BMO, the BMH will stay even if the ns containing it gets deleted?

@NymanRobin
Copy link
Member Author

I mean we cannot delete any namespace under the watch config before tearing down the BMO, as otherwise the manager container will start to error out. If this is a design detail of the operator framework or if it comes from BMO I have not had time to figure out yet

@NymanRobin
Copy link
Member Author

Throwing out some ideas regarding the unanswered questions here:

I found a proposal that looks interesting in that regard that it could allow the operator to operate without all namespaces existing. Other than that it seems there is the possibility to dynamically change the watchNamespaces config at least based on this discussion in the operator framework mailing thread. The java-sdk also seems to have an API for this uncertain about the go one. I will ask also in the slack channel fro controller-runtime

@NymanRobin
Copy link
Member Author

NymanRobin commented Aug 26, 2024

There is a really good hints in slack on the dynamic limitations and how to potentially circumvent them slack thread FYI @tuminoid @Sunnatillo

@tuminoid
Copy link
Member

🧪 Using this PR for a bit of CI testing, ignore some following test triggers.

/retest

@tuminoid
Copy link
Member

🧪 Using this PR for a bit of CI testing, ignore some following test triggers.

/retest

OK, it works here too. Hmm.

@metal3-io-bot
Copy link
Contributor

@NymanRobin: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
metal3-dev-env-integration-test-ubuntu-main 11dbdad link false /test metal3-dev-env-integration-test-ubuntu-main
gomod 515e502 link true /test gomod
generate 515e502 link true /test generate
test 515e502 link true /test test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@metal3-io-bot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@metal3-io-bot metal3-io-bot added the needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. label Nov 4, 2024
@tuminoid
Copy link
Member

/retest
Testing prow, ignore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add documentation how to configure BMO for single namespace
5 participants