Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-for-fluent-bit pod ignores k8 security context values like runAsUser, runAsGroup, fsGroup, and runAsNonRoot #729

Open
ashenwgt opened this issue Sep 7, 2023 · 4 comments

Comments

@ashenwgt
Copy link

ashenwgt commented Sep 7, 2023

Describe the question/issue

I am trying to run the aws-for-fluent-bit container with a non-root user usingthe below k8 manifest.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  .....
spec:
  selector:
    matchLabels:
      k8s-app: fluent-bit
  template:
    metadata:
      labels:
        k8s-app: fluent-bit
        version: v1
        kubernetes.io/cluster-service: "true"
    spec:
      securityContext:
        fsGroup: 1000
        runAsUser: 1000
        runAsGroup: 1000
        runAsNonRoot: true
      containers:
      - name: fluent-bit
        image: public.ecr.aws/aws-observability/aws-for-fluent-bit:stable
        imagePullPolicy: Always
        securityContext:
            runAsUser: 1000
            runAsGroup: 1000
            runAsNonRoot: true
       .....
        volumeMounts:
        - name: fluentbitstate
          mountPath: /var/fluent-bit/state
        - name: varlog
          mountPath: /var/log
          readOnly: true
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluent-bit-config
          mountPath: /fluent-bit/etc/
        - name: runlogjournal
          mountPath: /run/log/journal
          readOnly: true
        - name: dmesg
          mountPath: /var/log/dmesg
          readOnly: true
      terminationGracePeriodSeconds: 10
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      volumes:
      - name: fluentbitstate
        hostPath:
          path: /var/fluent-bit/state
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluent-bit-config
        configMap:
          name: fluent-bit-config
      - name: runlogjournal
        hostPath:
          path: /run/log/journal
      - name: dmesg
        hostPath:
          path: /var/log/dmesg

Even though I explicitly set fsGroup to 1000 here, I noticed that the /var/fluent-bit/state directory gets created as root inside k8 host nodes.

$ ls -al /var/fluent-bit/
total 0
drwxr-xr-x  3 root root  19 Sep  7 06:02 .
drwxr-xr-x 20 root root 286 Sep  7 06:02 ..
drwxr-xr-x  2 root root   6 Sep  7 06:02 state

Also, with the above settings, fluent-bit pods go to a CrashLoopBackOff with the below errors on logs.

Fluent Bit v1.9.10
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2023/09/07 07:45:28] [ info] [fluent bit] version=1.9.10, commit=1f4d09e087, pid=1
[2023/09/07 07:45:28] [error] [storage] [chunkio] cannot initialize root path /var/fluent-bit/state/flb-storage/

[2023/09/07 07:45:28] [error] [storage] error initializing storage engine
[2023/09/07 07:45:28] [error] [lib] backend failed
AWS for Fluent Bit Container Image Version 2.31.11

As of these discussions on aws/eks-charts repo (aws/eks-charts#928) and fluent/fluent-bit repo (fluent/fluent-bit#872), I learned that this container has to run as root.

Can you please confirm my understanding?

If that is not the case, then is there a way to run the aws-for-fluent-bit container as a non-root user and with non-root-owned volumes?

@PettitWesley
Copy link
Contributor

I'm not sure about this; I'm testing it out myself in an EKS cluster today.

Existing guidance I can find suggests that since the pod log files are root owned, FLB must also run as root:

However, this doesn't make sense to me... I think if we give FLB the right capabilities it should be able to read the pod log files and probably even create its storage directory.

https://man7.org/linux/man-pages/man7/capabilities.7.html

I'll post here once I'm done testing.

@PettitWesley
Copy link
Contributor

Alrighty, it seems that adding extra capabilities does not work:

[2023/09/29 22:35:20] [error] [plugins/in_tail/tail_file.c:888 errno=13] Permission denied
[2023/09/29 22:35:20] [error] [input:tail:tail.4] cannot open /var/log/containers/aws-node-74sfs_kube-system_aws-vpc-cni-init-8e3f6a198939804f5a716d92d7b0fe96b984fe4efc98e1b4ec04d1ceab5fc04e.log
[2023/09/29 22:35:20] [error] [plugins/in_tail/tail_file.c:888 errno=13] Permission denied
[2023/09/29 22:35:20] [error] [input:tail:tail.4] cannot open /var/log/containers/kube-proxy-jsgfc_kube-system_kube-proxy-40c90418e671cc466cb20d9f380ae578c0db2819fb097fb2db5320b1ef253ef9.log

I got this even though I set:

    spec:
      securityContext:
        fsGroup: 1000
        runAsUser: 1000
        runAsGroup: 1000
        runAsNonRoot: true
      containers:
      - name: fluent-bit
        image: public.ecr.aws/aws-observability/aws-for-fluent-bit:stable
        imagePullPolicy: Always
        securityContext:
          runAsUser: 1000
          runAsGroup: 1000
          runAsNonRoot: true
          capabilities:
            drop:
              - ALL
            add:
              - CAP_FOWNER
              - CAP_DAC_OVERRIDE
              - CAP_DAC_READ_SEARCH
              - CAP_FSETID

@PettitWesley
Copy link
Contributor

And of course, if you use host volume mounts for the tail DB or the storage.path, then that will fail due to permissions as well:

[2023/09/29 22:33:54] [error] [sqldb] cannot open database /var/fluent-bit/state/flb_container.db
[2023/09/29 22:33:54] [error] [input:tail:tail.0] could not open/create database
[2023/09/29 22:33:54] [error] [lib] backend failed

@PettitWesley
Copy link
Contributor

Those capabilities can be used in known container breakout attacks, so even if adding them worked, this likely still wouldn't satisfy the true goal of non-root, which is to lock down containers.

I'm very surprised it does not work though, I guess I don't understand those linux capabilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants