Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent workflows code from exploiting pod patch permission to change non-workflow pods #3961

Closed
Tracked by #7964
Asaf-m opened this issue Sep 8, 2020 · 14 comments · Fixed by #8000
Closed
Tracked by #7964
Labels
area/executor type/feature Feature request type/security Security related

Comments

@Asaf-m
Copy link

Asaf-m commented Sep 8, 2020

Summary

What change needs making?

Find a way to prevent malicious code from exploiting patch permission of the minimum RBAC privileges.

Details:

The minimum RBAC privileges of workflow includes patch pods permission, this seems to be a potential security issue.
patch permission allows to do actions like kubectl patch pod valid-pod -type='json' -p='[{"op": "replace", "path": "/spec/containers/0/image", "value":"new image"}]', meaning, it allows to change all the images in the namespace, in other words, bring the namespace down.

The problem is even grater since role is set on pod and not on container, so not only that argo's wait container is getting this role, but also user's main container` is getting it. This means any malicious code creeped into the pod can exploit this role.

I'm not sure how this can be done. I can say I tried the solution suggested here and it worked. But it's a big mess to make it work with kustomize, so I wish for a more elegant solution.

Use Cases

Always.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@Asaf-m Asaf-m added the type/feature Feature request label Sep 8, 2020
@simster7
Copy link
Member

simster7 commented Sep 8, 2020

I'll bring this up with the team

@Asaf-m
Copy link
Author

Asaf-m commented Sep 22, 2020

@simster7 Any update?

@appellod
Copy link

I also have some concerns with my Workflow requiring Pod patch permissions. I am developing a system that allows external users to run arbitrary workflows within my system. I have each user segregated into their own Namespace, so it wouldn't be absolutely devastating if they achieved Pod patch access, but it is still something that could potentially wreak havoc on my system.

Currently, I am setting automountServiceAccountToken: false with the executor.serviceAccountName field set. Does this do a workaround similar to the solution @Asaf-m mentioned in his original post? Is there a way the main container can then extract that token from the Argo sidecar and gain unauthorized access?

@alexec
Copy link
Contributor

alexec commented Jan 18, 2021

I think we could address this simply another way. If the pod can patch the workflow, then we could directly update the status.

To avoid conflicts, and to work with node offloading. We would need a new field to store the data in.

Not sure how this scales with many patches. So we could introduce another CRD as discussed here:

https://docs.google.com/document/d/18hg6PTejp1knp5QTaCwP4j4gUTsRu4KDeKHs-4l9shs/edit

This is not a popular issue.

@alexec alexec linked a pull request Feb 25, 2021 that will close this issue
@alexec
Copy link
Contributor

alexec commented Feb 25, 2021

@jessesuen

https://kubernetes.io/docs/tasks/debug-application-cluster/determine-reason-pod-failure/#customizing-the-termination-message

The termination message is intended to be brief final status, such as an assertion failure message. The kubelet truncates messages that are longer than 4096 bytes. The total message length across all containers will be limited to 12KiB. The default termination message path is /dev/termination-log. You cannot set the termination message path after a Pod is launched

@alexec
Copy link
Contributor

alexec commented Feb 25, 2021

Moreover, users can set the terminationMessagePolicy field of a Container for further customization. This field defaults to "File" which means the termination messages are retrieved only from the termination message file. By setting the terminationMessagePolicy to "FallbackToLogsOnError", you can tell Kubernetes to use the last chunk of container log output if the termination message file is empty and the container exited with an error. The log output is limited to 2048 bytes or 80 lines, whichever is smaller.

Ohhhh.... interesting!

@alexec
Copy link
Contributor

alexec commented Feb 25, 2021

In v3.1 you will be able run workflows without pod patch, so long as they do not have outputs.

@phelinor
Copy link

In v3.1 you will be able run workflows without pod patch, so long as they do not have outputs.

@alexec I can see that v3.1.2 is already available but this issue is still Open and I can still see the Patch verb in the installation file https://github.com/argoproj/argo-workflows/blob/master/manifests/install.yaml

Could please confirm that we are able to run workflows without pod patch?

@alexec
Copy link
Contributor

alexec commented Jul 26, 2021

With the introduction of TaskSet we now have a way to replace pod patch with taskset patch.

@alexec
Copy link
Contributor

alexec commented Feb 22, 2022

We should test some attacks to verify this is true. Much of the pod spec is immutable, is it really true that you can change the image or args?

@alexec alexec changed the title Prevent workflow RBAC patch permission exploitation Prevent workflows code from exploiting pod patch permission to change non-workflow pods Feb 22, 2022
@alexec
Copy link
Contributor

alexec commented Feb 25, 2022

Notes from PoC:

  • We can use a seperate CRD to report back results from the wait container to the controller.
  • We can get race condition between pod informer and task-result informer.
  • We can't use taskset, it is not big enough for outputs.

@SebastianGoeb
Copy link

In v3.1 you will be able run workflows without pod patch, so long as they do not have outputs.

Does this refer to global workflow outputs or step outputs? We use step outputs to communicate things between steps, and I don't see how pod/patch is required for that. What's stopping the controller from just creating the workflow pod with the required volumes for passing outputs from main to wait in the first place, and then leaving it like that? Is this really something the sidecar must do?

@alexec
Copy link
Contributor

alexec commented Feb 25, 2022

Correct. Outputs are patched onto the pod using annotations. Not outputs, no need for annotations. You could pass outputs using a volume mounted to all pods in the workflow. This volume would need to be readable from the controller. Not sure if that’s possible.

@jessesuen I think we only need patch for result and logs, exit code is found by controller. We know that we need these in the controller. Maybe we should look at pods/log in more detail.

@alexec
Copy link
Contributor

alexec commented Feb 25, 2022

Reviewing pods/log option:

The current solution the executor capturing logs scales. Consider a 1000 node workflow, each pod captures its own logs. If this was moved to the controller, it would have to do much more work that it currently does. The controller is the wrong place to do heavy lifting as it creates a single point of failure.

On top of this, we don't know where to save the main.log (or any artifact) in the controller, because it does not have get secrets permission (which it needs).

We could write the outputs to the logs, rather than as an annotation, or to the container termination-log, but these all have different problems.

Who else could do this? The agent, but it's just moving the problem.

alexec added a commit to alexec/argo-workflows that referenced this issue Mar 1, 2022
alexec added a commit that referenced this issue Mar 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment