-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update SCC and RBAC handling for DevWorkspaces #954
Conversation
Codecov ReportBase: 49.52% // Head: 53.53% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #954 +/- ##
==========================================
+ Coverage 49.52% 53.53% +4.01%
==========================================
Files 38 43 +5
Lines 2417 2699 +282
==========================================
+ Hits 1197 1445 +248
- Misses 1095 1123 +28
- Partials 125 131 +6
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
I experienced some issues with the migration steps, though I may have made a mistake somewhere. Here are the steps I followed as well as the results I got:
Then I started testing the migration process:
I started the
I then tried deleting the Additionally, I tried to check that the serviceaccount was removed from the container-build SCC, which it seemed to have been: $ oc adm policy who-can use scc container-build
resourceaccessreviewresponse.authorization.openshift.io/<unknown>
Namespace: devworkspace-controller
Verb: use
Resource: securitycontextconstraints.security.openshift.io
Users: andrew
system:admin
system:serviceaccount:openshift-apiserver-operator:openshift-apiserver-operator
system:serviceaccount:openshift-apiserver:openshift-apiserver-sa
system:serviceaccount:openshift-authentication-operator:authentication-operator
system:serviceaccount:openshift-authentication:oauth-openshift
system:serviceaccount:openshift-cluster-node-tuning-operator:cluster-node-tuning-operator
system:serviceaccount:openshift-cluster-storage-operator:cluster-storage-operator
system:serviceaccount:openshift-cluster-storage-operator:csi-snapshot-controller-operator
system:serviceaccount:openshift-cluster-version:default
system:serviceaccount:openshift-config-operator:openshift-config-operator
system:serviceaccount:openshift-controller-manager-operator:openshift-controller-manager-operator
system:serviceaccount:openshift-etcd-operator:etcd-operator
system:serviceaccount:openshift-etcd:installer-sa
system:serviceaccount:openshift-kube-apiserver-operator:kube-apiserver-operator
system:serviceaccount:openshift-kube-apiserver:installer-sa
system:serviceaccount:openshift-kube-apiserver:localhost-recovery-client
system:serviceaccount:openshift-kube-controller-manager-operator:kube-controller-manager-operator
system:serviceaccount:openshift-kube-controller-manager:installer-sa
system:serviceaccount:openshift-kube-controller-manager:localhost-recovery-client
system:serviceaccount:openshift-kube-scheduler-operator:openshift-kube-scheduler-operator
system:serviceaccount:openshift-kube-scheduler:installer-sa
system:serviceaccount:openshift-kube-scheduler:localhost-recovery-client
system:serviceaccount:openshift-kube-storage-version-migrator-operator:kube-storage-version-migrator-operator
system:serviceaccount:openshift-kube-storage-version-migrator:kube-storage-version-migrator-sa
system:serviceaccount:openshift-machine-api:cluster-baremetal-operator
system:serviceaccount:openshift-machine-config-operator:default
system:serviceaccount:openshift-network-operator:default
system:serviceaccount:openshift-oauth-apiserver:oauth-apiserver-sa
system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount
system:serviceaccount:openshift-service-ca-operator:service-ca-operator
Groups: system:cluster-admins
system:masters After doing all of the above, I wanted to see if I could delete the theia-next workspace and make it again. When I tried deleting the workspace, the {
"level":"info",
"ts":1666634551.359788,
"logger":"controllers.DevWorkspace",
"msg":"Reconciling Workspace",
"Request.Namespace":"devworkspace-controller",
"Request.Name":"theia-next",
"devworkspace_id":"workspace21d193cb97db4179",
"resolvedConfig":""
}{
"level":"info",
"ts":1666634551.3598063,
"logger":"controllers.DevWorkspace",
"msg":"Finalizing DevWorkspace",
"Request.Namespace":"devworkspace-controller",
"Request.Name":"theia-next",
"devworkspace_id":"workspace21d193cb97db4179"
}{
"level":"info",
"ts":1666634551.361374,
"logger":"controllers.DevWorkspace",
"msg":"failed to delete role devworkspace-default-role in namespace devworkspace-controller: roles.rbac.authorization.k8s.io \"devworkspace-default-role\" is forbidden: User \"system:serviceaccount:devworkspace-controller:devworkspace-controller-serviceaccount\" cannot delete resource \"roles\" in API group \"rbac.authorization.k8s.io\" in the namespace \"devworkspace-controller\"",
"Request.Namespace":"devworkspace-controller",
"Request.Name":"theia-next",
"devworkspace_id":"workspace21d193cb97db4179"
}
Let me know if there are other details that would be useful to know, or if I made a mistake somewhere in following the instructions for this PR. |
299547d
to
5bb3d37
Compare
As this PR makes changes to the deploy folder (the role for DWO is updated), it's necessary to Did the workspace reach a running state? |
The default way to provision roles and rolebindings in DevWorkspace namespaces was by creating roles and bindings with static names. This needs to be extended and so the previous way of provisioning needs to be safely deprecated. Signed-off-by: Angel Misevski <[email protected]>
Add package pkg/provision/workspace/rbac to handle workspace roles and rolebindings more carefully. For regular workspaces, each namespace gets one role and rolebinding. The rolebinding is updated to have all workspace serviceaccounts in the namespace as subjects, and this list is updated when workspaces are deleted. If all workspaces in a namespace are deleted, then the role and rolebinding are deleted as well. If a workspaces uses the 'controller.devfile.io/scc' attribute, an additional role+binding for that SCC name is created and managed in the same way. Signed-off-by: Angel Misevski <[email protected]>
Add functionality to automatically delete old roles and rolebindings when a workspace is processed. This removes the previously-used 'workspace' role and 'devworkspacedw' rolebinding, which have no normal mechanism for their removal. Signed-off-by: Angel Misevski <[email protected]>
Switch main reconcile loop to use new RBAC sync functions and remove old rbac files. Signed-off-by: Angel Misevski <[email protected]>
Since SCCs are added to serviceaccounts via roles and rolebindings, it's no longer necessary to update SCCs to add workspace serviceaccounts as users. This step is removed, as is functionality for automatically adding the SA finalizer to workspaces. However, code related to finalizing workspaces and cleaning up changes made to SCCs is kept intact in order to ensure any workspaces that previously used the finalizer can still be deleted cleanly. Signed-off-by: Angel Misevski <[email protected]>
Update cluster role used by DWO to add verbs 'delete', 'deletecollection' for roles/rolebindings Signed-off-by: Angel Misevski <[email protected]>
Update the cache filter for roles and rolebindings to check for appropriate label. As a result, the old workspace role/rolebinding are invisible to the controller, so the process for cleaning up old resources needs to be updated as well. Signed-off-by: Angel Misevski <[email protected]>
Signed-off-by: Angel Misevski <[email protected]>
Signed-off-by: Angel Misevski <[email protected]>
5bb3d37
to
8c9c624
Compare
Signed-off-by: Angel Misevski <[email protected]>
Okay, thank you for the clarification 👍 My cluster just died while I was still testing, but the One thing I should note:
IIRC all the workspaces I had started during my previous (incorrect) testing had reached a running state. Issues only occured when trying to delete them. |
Additional debugging:
|
For some reason, doing a Here are some new findings:
[aobuchow@fedora]$ oc auth can-i delete roles --as system:serviceaccount:${NAMESPACE}:devworkspace-controller-serviceaccount
no
[aobuchow@fedora]$ oc auth can-i delete roles --as system:serviceaccount:test-project:devworkspace-controller-serviceaccount
no
[aobuchow@fedora]$ oc auth can-i delete roles --as system:serviceaccount:test-project:devworkspace-controller-serviceaccount
no
[aobuchow@fedora]$ oc auth can-i delete roles --as system:serviceaccount:${NAMESPACE}:devworkspace-controller-serviceaccount
yes
{
"level":"info",
"ts":1667411040.449669,
"logger":"controllers.DevWorkspace",
"msg":"DevWorkspace failed to start: Error provisioning rbac: roles.rbac.authorization.k8s.io \"devworkspace-use-container-build\" is forbidden: user \"system:serviceaccount:devworkspace-controller:devworkspace-controller-serviceaccount\" (groups=[\"system:serviceaccounts\" \"system:serviceaccounts:devworkspace-controller\" \"system:authenticated\"]) is attempting to grant RBAC permissions not currently held:\n{APIGroups:[\"security.openshift.io\"], Resources:[\"securitycontextconstraints\"], ResourceNames:[\"container-build\"], Verbs:[\"use\"]}",
"Request.Namespace":"test-project",
"Request.Name":"container-build",
"devworkspace_id":"workspace4dc708fd5c484f53"
} Note: I hadn't yet added the new ClusterRole and ClusterRoleBinding that this PR mentions (I probably should have, as that seems related to the error with the "use" verb being mentioned)
My cluster just timed out, so I'm going to try this again, but will apply the new ClusterRole and ClusterRoleBinding from the PR before I install the new DWO image (from the PR) |
@amisevsk
EDIT: actually, I think this is expected: devworkspace-operator/pkg/provision/workspace/serviceaccount.go Lines 104 to 107 in 776dd45
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tl;dr: I got things to work as expected.
Here are my observations:
- Initially, I overwrote the old ClusterRole and ClusterRoleBinding (from the SCC PR) with the new one that has the
use
verb. I had to revert to the old ClusterRole and ClusterRoleBinding since that failed the workspaces (to be expected and is mentioned in the PR description):
[aobuchow@fedora]$ oc get dw -n test-project2
NAME DEVWORKSPACE ID PHASE INFO
container-build workspacef0f6eaa7c319408e Failed operator does not have permissions to get the 'container-build' SecurityContextConstraints
container-build-2 workspacec01561da72cf40b0 Failed operator does not have permissions to get the 'container-build' SecurityContextConstraints
- After deploying the PR image, I saw that the old
workspace
role anddevworkspacedw
rolebinding were deleted:
{
"level":"info",
"ts":1667417292.5182207,
"logger":"controllers.DevWorkspace",
"msg":"deleted deprecated DevWorkspace Role",
"Request.Namespace":"test-project2",
"Request.Name":"container-build",
"devworkspace_id":"workspace127b9a98d3f64ff8"
}
{
"level":"info",
"ts":1667417292.5728934,
"logger":"controllers.DevWorkspace",
"msg":"deleted deprecated DevWorkspace RoleBinding",
"Request.Namespace":"test-project2",
"Request.Name":"container-build-2",
"devworkspace_id":"workspacebec9f4cdef3b4a88"
}
- I then saw that the new Role was created but wasn't ready:
{
"level":"info",
"ts":1667417292.6531458,
"logger":"controllers.DevWorkspace",
"msg":"Created object",
"Request.Namespace":"test-project2",
"Request.Name":"container-build-2",
"kind":"v1.Role",
"name":"devworkspace-use-container-build"
}{
"level":"info",
"ts":1667417292.6531677,
"logger":"controllers.DevWorkspace",
"msg":"v1.Role devworkspace-use-container-build is not ready: Created object",
"Request.Namespace":"test-project2",
"Request.Name":"container-build-2",
"devworkspace_id":"workspacebec9f4cdef3b4a88"
}
- I then deleted both workspaces (which were initially started on a DWO image built from the main branch) and saw that the Service account was successfully cleaned up as well as the old finalizer and RBAC. The new Role and RoleBinding (
devworkspace-use-<SCC>
) was also deleted.
{
"level":"info",
"ts":1667417594.2655234,
"logger":"controllers.DevWorkspace",
"msg":"ServiceAccount clean up successful; clearing finalizer",
"Request.Namespace":"test-project2",
"Request.Name":"container-build-2",
"devworkspace_id":"workspacebec9f4cdef3b4a88"
}{
"level":"info",
"ts":1667417594.32489,
"logger":"controllers.DevWorkspace",
"msg":"Finalizing DevWorkspace",
"Request.Namespace":"test-project2",
"Request.Name":"container-build-2",
"devworkspace_id":"workspacebec9f4cdef3b4a88"
}{
"level":"info",
"ts":1667417594.3343432,
"logger":"controllers.DevWorkspace",
"msg":"deleted role devworkspace-use-container-build in namespace test-project2",
"Request.Namespace":"test-project2",
"Request.Name":"container-build-2",
"devworkspace_id":"workspacebec9f4cdef3b4a88"
}{
"level":"info",
"ts":1667417594.3680623,
"logger":"controllers.DevWorkspace",
"msg":"Reconciling Workspace",
"Request.Namespace":"test-project2",
"Request.Name":"container-build-2",
"devworkspace_id":"workspacebec9f4cdef3b4a88",
"resolvedConfig":""
}{
"level":"info",
"ts":1667417594.3680837,
"logger":"controllers.DevWorkspace",
"msg":"Finalizing DevWorkspace",
"Request.Namespace":"test-project2",
"Request.Name":"container-build-2",
"devworkspace_id":"workspacebec9f4cdef3b4a88"
}{
"level":"info",
"ts":1667417594.3913825,
"logger":"controllers.DevWorkspace",
"msg":"deleted rolebinding devworkspace-use-container-build in namespace test-project2",
"Request.Namespace":"test-project2",
"Request.Name":"container-build-2",
"devworkspace_id":"workspacebec9f4cdef3b4a88"
}{
"level":"info",
"ts":1667417594.4298122,
"logger":"controllers.DevWorkspace",
"msg":"Finalizing DevWorkspace",
"Request.Namespace":"test-project2",
"Request.Name":"container-build-2",
"devworkspace_id":"workspacebec9f4cdef3b4a88"
}{
"level":"info",
"ts":1667417594.4702287,
"logger":"controllers.DevWorkspace",
"msg":"deleted role devworkspace-default-role in namespace test-project2",
"Request.Namespace":"test-project2",
"Request.Name":"container-build-2",
"devworkspace_id":"workspacebec9f4cdef3b4a88"
}{
"level":"info",
"ts":1667417594.5233288,
"logger":"controllers.DevWorkspace",
"msg":"Finalizing DevWorkspace",
"Request.Namespace":"test-project2",
"Request.Name":"container-build-2",
"devworkspace_id":"workspacebec9f4cdef3b4a88"
}{
"level":"info",
"ts":1667417594.5577114,
"logger":"controllers.DevWorkspace",
"msg":"deleted rolebinding devworkspace-default-rolebinding in namespace test-project2",
"Request.Namespace":"test-project2",
"Request.Name":"container-build-2",
"devworkspace_id":"workspacebec9f4cdef3b4a88"
}{
"level":"info",
"ts":1667417594.6336832,
"logger":"controllers.DevWorkspace",
"msg":"Finalizing DevWorkspace",
"Request.Namespace":"test-project2",
"Request.Name":"container-build-2",
"devworkspace_id":"workspacebec9f4cdef3b4a88"
}{
"level":"info",
"ts":1667417594.6337557,
"logger":"controllers.DevWorkspace",
"msg":"RBAC cleanup successful; clearing finalizer",
"Request.Namespace":"test-project2",
"Request.Name":"container-build-2",
"devworkspace_id":"workspacebec9f4cdef3b4a88"
}
- Then I overwrote the old Cluster Role and ClusterRoleBInding with the new one from the PR, and created 2 new workspaces that used the
container-build
SCC through the SCC workspace attribute- Both workspaces started up successfully. Their service accounts were created as well.
- I saw that the
dwo-use-container-build
role and rolebinding were created - Deleting a single workspace resulted in the workspace's service account being removed from the rolebinding's subjects
- Deleting both workspaces resulted in the role and rolebinding being deleted
\o/
if err := api.Client.Delete(api.Ctx, role); err != nil { | ||
return err | ||
} | ||
return &RetryError{fmt.Errorf("deleted deprecated DevWorkspace Role")} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to ask why does it return RetryError
when deletion succeeds? Is there a reason for restarting the reconciliation instead of continuing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the general flow we follow for most things. Here it's likely safe to continue the reconcile, but in general the operator works by doing one thing at a time (i.e. given a cluster state, what's the next step towards moving it towards the intended state?). In other words, we're treating "no deprecated DevWorkspace Role exists" as a precondition for continuing with creating the new DevWorkspace Role.
The benefit of doing this is that it should be possible to track what each reconcile did to the cluster (e.g. you would see "reconciling DevWorkspace (...) deleted deprecated DevWorkspace Role".
Though, you make a good point. We could probably get away with just logging that we deleted the role here. I'll update the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On further thought -- the tests use this error to verify migration is proceeding as expected, so I think it's better to leave for now. The current plan is to remove code related to cleaning up old resources around the DWO 0.19.0 or 0.20.0 timeframe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see thank you @amisevsk , I was just curious, sounds good 👍
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: amisevsk, AObuchow, dkwon17 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@AObuchow a few notes:
This is expected behavior from Kubernetes -- OpenShift/Kubernetes do not track image registries looking for new images, so it's necessary to
I think you're misinterpreting the
where
This is expected, as a precondition for using a specific SCC with your workspace is that you let the DWO serviceaccount work with that SCC. The way Kubernetes RBAC works is that even if you are allowed to create Roles/Rolebindings, you cannot use that privilege to grant permissions you yourself do not have (e.g. I can't create a role that gives me cluster-admin unless I'm cluster-admin). What's happening in this case is DWO is attempting to give the DevWorkspace ServiceAccount permission to use the SCC (and thus run the workspace pod with that SCC instead of the default) but this is blocked because DWO itself cannot use that SCC. The ClusterRole/ClusterRoleBinding are required here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amisevsk not sure if this is expected but I got the following error on the already running workspace when updating DWO via make install from main to the version from PR:
message: >-
Error provisioning rbac: roles.rbac.authorization.k8s.io
"devworkspace-use-container-build" is forbidden: user
"system:serviceaccount:devworkspace-controller:devworkspace-controller-serviceaccount"
(groups=["system:serviceaccounts"
"system:serviceaccounts:devworkspace-controller" "system:authenticated"]) is
attempting to grant RBAC permissions not currently held:
{APIGroups:["security.openshift.io"],
Resources:["securitycontextconstraints"], ResourceNames:["container-build"],
Verbs:["use"]}
phase: Failed
Note that DWO update happened when the workspace was still RUNNING
after some discussion it was decided to postpone merging the PR since it would break the existing implementation of the The proposed merging cadence would be:
|
To be specific, the breaking change in this PR is that the ClusterRole granted to the DevWorkspace Operator ServiceAccount is updated: apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: dwo-access-to-scc
labels:
test: dwo-scc-access
rules:
- apiGroups:
- "security.openshift.io"
resources:
- "securitycontextconstraints"
resourceNames:
- "container-build"
verbs:
- "get"
- "update
+ - "use" Since this functionality is in use by the Che Operator, we need to update that operator to grant the additional permission first, or DWO 0.17.0 would be incompatible with Che 7.56.0 |
Add 'use' permissions in addition to 'get' and 'update' to be added to the DevWorkspace Operator ServiceAccount when container build functionality is enabled. This is required due to changes in the DevWorkspace Operator in devfile/devworkspace-operator#954 Signed-off-by: Angel Misevski <[email protected]>
Add 'use' permissions in addition to 'get' and 'update' to be added to the DevWorkspace Operator ServiceAccount when container build functionality is enabled. This is required due to changes in the DevWorkspace Operator in devfile/devworkspace-operator#954 Signed-off-by: Angel Misevski <[email protected]>
Reapply changes from PR devfile#954 (commits ac944d5..25d533b) that had been reverted (PR devfile#972) The original PR was reverted due to required changes in the Che Operator. As those are now applied, the original PR can be re-applied. Signed-off-by: Angel Misevski <[email protected]>
Reapply changes from PR #954 (commits ac944d5..25d533b) that had been reverted (PR #972) The original PR was reverted due to required changes in the Che Operator. As those are now applied, the original PR can be re-applied. Signed-off-by: Angel Misevski <[email protected]>
What does this PR do?
controller.devfile.io/scc
is handled:use
permissions are granted to the workspace serviceaccount via role/rolebindingget
andupdate
permissions on the SCC to configure workspaces to use the SCCCommits should have explanatory messages to detail what each one does. This PR marks some constants/functions as deprecated:
serviceaccount.controller.devfile.io
finalizer is deprecated, along with functionality for clearing that finalizer, as it is no longer used.In both cases, DWO supports cleaning up resources created/updated prior to this commit. Removal of the deprecated code should be done for DWO 0.19.
This PR changes the requirements for using a custom SCC with new workspaces. Instead of granting the DevWorkspace Operator serviceaccount
get
andupdate
permissions on the SCC to be used, DWO now only requires theuse
permissions:Note: if any workspaces exist that use the
serviceaccount.controller.devfile.io
finalizer, theget
andupdate
permissions for that SCC are still required by DWO in order to clear the finalizer when th workspace is deleted. To remove the need forget
andupdate
permissions, first delete all workspaces that have the serviceaccount finalizer. To see which workspaces use the finalizer, useand search for
serviceaccount.controller.devfile.io
.What issues does this PR fix or reference?
Part of #884 (see #884 (comment))
Is it tested? How?
As there are significant changes in this PR, there are a number of different things to test. Test cases should cover major functionality. To test migration processes, it is necessary to create + start workspaces using the current
main
branch commit, then upgrade to the PR changes and trigger a reconcile for those workspaces.workspace
) and rolebinding (devworkspacedw
) are deleted when starting a workspace with changes from this PR.devworkspace-use-<scc-name>
should be created in the namespacePR Checklist
/test v8-devworkspace-operator-e2e, v8-che-happy-path
to trigger)v8-devworkspace-operator-e2e
: DevWorkspace e2e testv8-che-happy-path
: Happy path for verification integration with Che