Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not create a cluster when running on BTRFS + LUKS encryption #2411

Closed
bergmannf opened this issue Aug 11, 2021 · 15 comments
Closed

Can not create a cluster when running on BTRFS + LUKS encryption #2411

bergmannf opened this issue Aug 11, 2021 · 15 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@bergmannf
Copy link

What happened:

When starting a kind cluster on an encrypted btrfs root partition the control-plane won't start up, because of an error in the kubelet:

Aug 11 07:33:59 kind-control-plane kubelet[833]: W0811 07:33:59.653820     833 fs.go:588] stat failed on /dev/mapper/luks-a389c146-db36-4c96-bcbc-0fa3f5f3fcd1 with error: no such file or directory
Aug 11 07:33:59 kind-control-plane kubelet[833]: E0811 07:33:59.653846     833 kubelet.go:1423] "Failed to start ContainerManager" err="failed to get rootfs info: failed to get device for dir \"/var/lib/kubelet\": could not find device with major: 0, minor: 40 in cached partitions map"

On the host the luks path is a symlink:

ls -la /dev/mapper
total 0
drwxr-xr-x.  2 root root      80 Aug 11 08:43 .
drwxr-xr-x. 21 root root    4600 Aug 11 08:44 ..
crw-------.  1 root root 10, 236 Aug 11 08:43 control
lrwxrwxrwx.  1 root root       7 Aug 11 08:43 luks-a389c146-db36-4c96-bcbc-0fa3f5f3fcd1 -> ../dm-0

As this path is not available in the container it fails.

What you expected to happen:

All paths required inside kind should be mapped into the node.

How to reproduce it (as minimally and precisely as possible):

Attempt to create a cluster on an encrypted root partition - in my case I simply installed Fedora and chose to encrypt the system in the installer.

Anything else we need to know?:

The issue is quite simple to fix, by just also mounting the missing path into the container.

With the following configuration it will work:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  extraMounts:
    - hostPath: /dev/dm-0
      containerPath: /dev/dm-0
      propagation: HostToContainer

Environment:

  • kind version: (use kind version):
    kind v0.11.1 go1.16.4 linux/amd64

  • Kubernetes version: (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T18:03:20Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-07-12T20:40:20Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
  • Docker version: (use docker info): not running docker, but rootless podman
  • OS (e.g. from /etc/os-release):
NAME=Fedora
VERSION="34 (Workstation Edition)"
ID=fedora
VERSION_ID=34
VERSION_CODENAME=""
PLATFORM_ID="platform:f34"
PRETTY_NAME="Fedora 34 (Workstation Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:34"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/34/system-administrators-guide/"
SUPPORT_URL="https://fedoraproject.org/wiki/Communicating_and_getting_help"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=34
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=34
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Workstation Edition"
VARIANT_ID=workstation
@bergmannf bergmannf added the kind/bug Categorizes issue or PR as related to a bug. label Aug 11, 2021
@BenTheElder
Copy link
Member

Can you share docker info -f '{{json .}}' (er the podman equivilant)?

I'm wondering if we have an opportunity to detect that LUKS is in use from podman/docker, if we can we can mount /dev/dm-0 in the same way we do with detecting btrfs and mounting /dev/mapper

If not, we could alternatively maybe finish detection of podman version + detect if remote or not + if not remote inspect the host filesystem from the kind binary (and not solve remote + btrfs + luks though). #2233

@BenTheElder
Copy link
Member

Er actually it seems these are LVM devices (and not LUKS specific?) in which case we have a bit of a worse problem, for the remote case in particular I'm not sure if we can enumerate these cleanly but we'll need to mount all /dev/dm-* and podman/docker can't express that. We can try to enumerate from kind directly but that presumes the kind binary has the same access as the podman invocation (which in particular won't be true for remote).

@BenTheElder
Copy link
Member

At the very least this warrants a https://kind.sigs.k8s.io/docs/user/known-issues/ entry to start, with the workaround.

@bergmannf
Copy link
Author

bergmannf commented Aug 13, 2021

So here is the podman info -f '{{ json . }}':

{
  "host": {
    "arch": "amd64",
    "buildahVersion": "1.21.3",
    "cgroupManager": "systemd",
    "cgroupVersion": "v2",
    "cgroupControllers": [],
    "conmon": {
      "package": "conmon-2.0.29-2.fc34.x86_64",
      "path": "/usr/bin/conmon",
      "version": "conmon version 2.0.29, commit: "
    },
    "cpus": 8,
    "distribution": {
      "distribution": "fedora",
      "version": "34"
    },
    "eventLogger": "journald",
    "hostname": "fedora",
    "idMappings": {
      "gidmap": [
        {
          "container_id": 0,
          "host_id": 1000,
          "size": 1
        },
        {
          "container_id": 1,
          "host_id": 100000,
          "size": 65536
        }
      ],
      "uidmap": [
        {
          "container_id": 0,
          "host_id": 1000,
          "size": 1
        },
        {
          "container_id": 1,
          "host_id": 100000,
          "size": 65536
        }
      ]
    },
    "kernel": "5.13.8-200.fc34.x86_64",
    "memFree": 246652928,
    "memTotal": 16473628672,
    "ociRuntime": {
      "name": "crun",
      "package": "crun-0.20.1-1.fc34.x86_64",
      "path": "/usr/bin/crun",
      "version": "crun version 0.20.1\ncommit: 0d42f1109fd73548f44b01b3e84d04a279e99d2e\nspec: 1.0.0\n+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL"
    },
    "os": "linux",
    "remoteSocket": {
      "path": "/run/user/1000/podman/podman.sock"
    },
    "serviceIsRemote": false,
    "security": {
      "apparmorEnabled": false,
      "capabilities": "CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT",
      "rootless": true,
      "seccompEnabled": true,
      "seccompProfilePath": "/usr/share/containers/seccomp.json",
      "selinuxEnabled": true
    },
    "slirp4netns": {
      "executable": "/usr/bin/slirp4netns",
      "package": "slirp4netns-1.1.9-1.fc34.x86_64",
      "version": "slirp4netns version 1.1.8+dev\ncommit: 6dc0186e020232ae1a6fcc1f7afbc3ea02fd3876\nlibslirp: 4.4.0\nSLIRP_CONFIG_VERSION_MAX: 3\nlibseccomp: 2.5.0"
    },
    "swapFree": 6640889856,
    "swapTotal": 8589930496,
    "uptime": "2h 44m 10.18s (Approximately 0.08 days)",
    "linkmode": "dynamic"
  },
  "store": {
    "configFile": "/home/florian/.config/containers/storage.conf",
    "containerStore": {
      "number": 3,
      "paused": 0,
      "running": 0,
      "stopped": 3
    },
    "graphDriverName": "overlay",
    "graphOptions": {
      
    },
    "graphRoot": "/home/florian/.local/share/containers/storage",
    "graphStatus": {
      "Backing Filesystem": "btrfs",
      "Native Overlay Diff": "false",
      "Supports d_type": "true",
      "Using metacopy": "false"
    },
    "imageStore": {
      "number": 17
    },
    "runRoot": "/run/user/1000/containers",
    "volumePath": "/home/florian/.local/share/containers/storage/volumes"
  },
  "registries": {
    "search": [
  "registry.fedoraproject.org",
  "registry.access.redhat.com",
  "docker.io",
  "quay.io"
]
  },
  "version": {
    "APIVersion": "3.2.3",
    "Version": "3.2.3",
    "GoVersion": "go1.16.6",
    "GitCommit": "",
    "BuiltTime": "Mon Aug  2 21:39:21 2021",
    "Built": 1627933161,
    "OsArch": "linux/amd64"
  }
}

I don't think there should be any LVM devices (to be honest I only used the installation defaults and selected encryption AFAIR, but it has been a while) - I think those should just be the btrfs subvolumes:

sudo btrfs subvolume list /
ID 256 gen 2032098 top level 5 path home
ID 257 gen 2032110 top level 5 path root
ID 262 gen 1994774 top level 257 path var/lib/machines

(Neither lvscan nor lvdisplay show anything - after doubling checking - yes it should be only btrfs volumes - https://fedoramagazine.org/choose-between-btrfs-and-lvm-ext4/)

I was thinking that it might be enough to check if there is a symlink inside /dev/mapper and if so, follow it and mount the target as well?

@dahrens
Copy link

dahrens commented Aug 13, 2021

I hit the same issue on btrfs without LUKS. Using the workaround describe by @bergmannf worked for me as well.

@BenTheElder
Copy link
Member

I was thinking that it might be enough to check if there is a symlink inside /dev/mapper and if so, follow it and mount the target as well?

We can only do this if we take care to ensure that podman/docker is not running on another host (which people unfortunately do depend on for e.g. CI and so on) else we're inspecting the wrong machine / filesystem which would be breaking if they differ (we'll try to mount the wrong things). (Discussion about mounting lv devices above applies to following symlinks instead).

@dahrens
Copy link

dahrens commented Aug 15, 2021

I dug a little deeper since in my case there is no symlink missing, but my btrfs device is just not mounted automatically into the node. I use btrfs without LUKS therefore there are no /dev/mapper devices. Instead I just have a partitioned disk that looks like that (from the fstab perspective).

# /dev/nvme0n1p2
UUID=3e04c83b-1d81-4159-9411-b4ad5bdef790	/         	btrfs     	rw,relatime,discard=async,ssd,space_cache,subvolid=256,subvol=/@,subvol=@	0 0

# /dev/nvme0n1p2
UUID=3e04c83b-1d81-4159-9411-b4ad5bdef790	/home     	btrfs     	rw,relatime,discard=async,ssd,space_cache,subvolid=257,subvol=/@home,subvol=@home	0 0

Therefore the solution worked out in #1416 does not work in that setup. I'm using btrfs as storageDriver as well. Providing /dev/nvme0n1p2 as an extraMount succesfully works around that glitch.

@BenTheElder BenTheElder added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Sep 7, 2021
@BenTheElder
Copy link
Member

I don't think there's a good way to discover these paths, and docker already is responsible for mounting /dev/... to the node, it just doesn't mount everything.

If we had very high confidence that the cluster was running against a local runtime and not a remote node we could have the kind binary attempt to inspect /dev for this, but right now we do not have that confidence and we'd risk breaking remote users by trying to add mounts to the nodes based on inspecting the wrong filesystem.

It's also worth noting that Kubernetes only tests on ext4/overlayfs, and kubernetes itself has had bugs with other filesystems.

@simon-geard
Copy link
Contributor

Seeing the same thing as @dahrens ... a stock Fedora installation with BTRFS everywhere. Using the following config file seems to have worked.

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  extraMounts:
    - hostPath: /dev/nvme0n1p3
      containerPath: /dev/nvme0n1p3
      propagation: HostToContainer

I appreciate that this may be hard to resolve automatically, but it would be good to document it. What would it take to get this added to the "known issues" page? And can someone perhaps explain the nature of the problem? I get that it's failing because something inside the control plane wants access to the host filesystem, but I don't understand why it cares what's happening at the device layer?

@aojea
Copy link
Contributor

aojea commented Jan 8, 2022

What would it take to get this added to the "known issues" page?

Just a pr to this file https://github.com/kubernetes-sigs/kind/blob/main/site/content/docs/user/known-issues.md , contributions are welcome 😁

@BenTheElder
Copy link
Member

And can someone perhaps explain the nature of the problem? I get that it's failing because something inside the control plane wants access to the host filesystem, but I don't understand why it cares what's happening at the device layer?

It fails because kubelet (Kubernetes' node agent) is trying to determine filesystem stats (free space) and can't find the underlying disk.

Since last looking at this it someone brought up that it appears to be possible to disable the entire disk isolation system with a feature gate. I'm not sure this is a great answer either though ...

@simon-geard
Copy link
Contributor

What would it take to get this added to the "known issues" page?

Just a pr to this file https://github.com/kubernetes-sigs/kind/blob/main/site/content/docs/user/known-issues.md , contributions are welcome grin

Ok, so the essential points seem to be:

  1. Kubernetes needs access to storage device nodes in order to do some stuff, e.g. tracking free disk space. Therefore, Kind needs to mount the necessary device nodes from the host into the control-plane container.
  2. Kubernetes knows with certainty which device it's looking for, but Kind doesn't have that same information, and cannot always determine which devices need to be mounted. In particular, it knows how to work with LVM, but doesn't know how to deal with BTRFS filesystems.
  3. The workaround is to manually configure the device mount using a config file like the examples elsewhere in this issue.
  4. The necessary device is reported in the error message, but for rootless Docker/Podman, is probably the device containing $HOME.

If someone can confirm that those basic facts are correct, I'd be happy to put something together.

simon-geard added a commit to simon-geard/kind that referenced this issue Jan 11, 2022
Following discussions under issue kubernetes-sigs#2411, documenting problem with finding rootfs
device with BTRFS (and maybe other unrecognised filesystems), along with the
workaround of adding devices as extra mounts.

Also threw in a quick reminder at the top of the page about how to obtain logs
if cluster creation fails.
@BenTheElder
Copy link
Member

I think #2584 is the best we can do for now

@bergmannf
Copy link
Author

Happy to close it - as I just retested this on my Fedora 37 (with kind 0.17.0) and even with LUKS encrypted volumes I can't reproduce it.
So I guess for anyone still running into this issue, the documentation should prove good enough to fix the problem.

@jiridanek
Copy link

jiridanek commented Apr 18, 2023

@bergmannf I can confirm that. kind create cluster worked without any further configuration or anything special on my Fedora 38, with either Docker version 23.0.1, build a5ee5b1, or rootless podman version 4.4.4 hosting the kind cluster, and with kind version 0.18.0, using BTRFS on LUKS.

Something somewhere done by somebody fixed this.

I did

KIND_EXPERIMENTAL_PROVIDER=podman ./kind-linux-amd64 create cluster
kubectl run my-shell --rm -i --tty --image ubuntu -- bash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

6 participants