Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podman support on Linux #966

Merged
merged 35 commits into from
Jul 5, 2022
Merged

Podman support on Linux #966

merged 35 commits into from
Jul 5, 2022

Conversation

Erotemic
Copy link
Contributor

I've been using an old fork of cibuildwheel for awhile (discussion on this in #676) and I've taken an hour to port the changes to the latest version of cibuildwheel.

The basic idea is that docker is now abstracted via a CIBW_OCI_EXE variable. By setting CIBW_OCI_EXE="podman". And using that in the appropriate places, it should "just work".

However, podman does have some edge cases that make it different than docker. It can require special additional arguments to make it work correctly. In previous versions of my code I hard coded those options, but in this version, if the user is not using docker, I make them specify exactly what they need. In my use cases the following environment variables work:

export CIBW_OCI_EXE="podman"
export CIBW_OCI_EXTRA_ARGS_CREATE="--events-backend=file --privileged"
export CIBW_OCI_EXTRA_ARGS_COMMON="--cgroup-manager=cgroupfs --storage-driver=vfs --root=$HOME/.local/share/containers/vfs-storage/"
export CIBW_OCI_EXTRA_ARGS_START="--events-backend=file --cgroup-manager=cgroupfs --storage-driver=vfs"

Lastly, there are a few instances where podman worked very differently than docker. The biggest one is copy_out, which I had to completely redo.

Other minor issues are:

  • for podman there seems to be some race condition in __exit__, but adding a few sleeps fixes it
  • podman doesnt have the same cwd semantics as docker, so I couldn't use the -w argument, and I just explicitly change working dirs (shouldn't be a big issue).
  • There is an exclude hack in copy_into. Not sure if that is needed anymore.

I've done some minor testing of this port locally and it seems to work (the general logic I've been using in production for months now). I imagine this PR probably needs some discussion and modifications, but the general idea is here, and ported to the latest version.

At the very least, I'll try to keep this PR up to date so I can continue using this feature, as I do need it.

@Erotemic
Copy link
Contributor Author

Erotemic commented Dec 17, 2021

So it looks like most of the tests are passing, but for some reason unit_test/docker_container_test.py seems to fail. Any insights on why this is would be helpful. EDIT: figured it out, and it was my fault. Should be fixed now.

Also, I removed the aforementioned output hack and did some minor cleanup. Code needs review and more testing. Not sure what the right way to add podman tests to the test suite is.

@jshowacre
Copy link

jshowacre commented Dec 19, 2021

Hi @Erotemic, what are your thoughts on using the docker-py package? Podman strives to maintain compatibility with this and I have a slightly modified test suite that seems to work with podman's Compat API.

docker_container_remote.py

import io
import json
import os
import shlex
from pathlib import PurePath
import subprocess
import sys
import uuid
from tarfile import TarFile
from docker import DockerClient, from_env
import docker
from docker.models.containers import Container
from pathlib import Path, PurePath
from types import TracebackType
from typing import IO, Dict, List, Optional, Sequence, Type, cast

from .typing import PathOrStr, PopenBytes

class RemoteDockerContainer:
  """
  An object that represents a remote running Docker container.

  Intended for use as a context manager e.g.
  `with DockerContainer(docker_image = 'ubuntu') as docker:`

  A bash shell is running in the remote container. When `call()` is invoked,
  the command is relayed to the remote shell, and the results are streamed
  back to cibuildwheel.
  """

  UTILITY_PYTHON = "/opt/python/cp38-cp38/bin/python"
  client: DockerClient
  cont: Container

  def __init__(
      self, *, docker_image: str, simulate_32_bit: bool = False, cwd: Optional[PathOrStr] = None
  ):
      if not docker_image:
          raise ValueError("Must have a non-empty docker image to run.")

      self.docker_image = docker_image
      self.simulate_32_bit = simulate_32_bit
      self.cwd = cwd
      self.name: Optional[str] = None
      self.client = from_env()

  def __enter__(self) -> "RemoteDockerContainer":
      self.name = f"cibuildwheel-{uuid.uuid4()}"
      cwd_args = ["-w", str(self.cwd)] if self.cwd else []
      shell_args = ["linux32", "/bin/bash"] if self.simulate_32_bit else ["/bin/bash"]
      image = self.client.images.pull(self.docker_image)
      self.cont = self.client.containers.create(image, name=self.name, command=shell_args, working_dir=self.cwd, environment=['CIBUILDWHEEL'], network_mode="host", auto_remove=True, stdin_open=True)
      self.cont.start()
      return self

  def __exit__(
      self,
      exc_type: Optional[Type[BaseException]],
      exc_val: Optional[BaseException],
      exc_tb: Optional[TracebackType],
  ) -> None:
      self.cont.stop()
      # if (self.cont is not None):
          # self.cont.remove()

  def copy_into(self, from_path: Path, to_path: PurePath) -> None:
      with io.BytesIO() as mem, TarFile.open(fileobj=mem, mode='w|gz') as tar:
          tar.add(from_path, arcname=from_path.name)
          # tar.list()
          tar.close()
          mem.seek(0)
          self.cont.put_archive(to_path, mem.getvalue())

  def copy_out(self, from_path: PurePath, to_path: Path) -> None:
      data, stat = self.cont.get_archive(from_path, encode_stream=True)
      with io.BytesIO() as mem:
          for chk in data:
              mem.write(chk)
          mem.seek(0)
          with TarFile.open(fileobj=mem) as tar:
              # tar.list()
              tar.extractall(path=to_path, numeric_owner=True)

  def glob(self, path: PurePath, pattern: str) -> List[PurePath]:
      glob_pattern = os.path.join(str(path), pattern)

      path_strs = json.loads(
          self.call(
              [
                  self.UTILITY_PYTHON,
                  "-c",
                  f"import sys, json, glob; json.dump(glob.glob({glob_pattern!r}), sys.stdout)",
              ],
              capture_output=True,
          )
      )

      return [PurePath(p) for p in path_strs]

  def call(
      self,
      args: Sequence[PathOrStr],
      env: Optional[Dict[str, str]] = None,
      capture_output: bool = False,
      cwd: Optional[PathOrStr] = None,
  ) -> str:
      env = dict() if env is None else env
      env = dict([(shlex.quote(k), shlex.quote(v)) for k, v in env.items()])
      args = shlex.join([(str(p) if isinstance(p, PurePath) else p) for p in args])
      return self.cont.exec_run(args, workdir=cwd, environment=env, demux=False, stream=False).output

  def get_environment(self) -> Dict[str, str]:
      env = json.loads(
          self.call(
              [
                  self.UTILITY_PYTHON,
                  "-c",
                  "import sys, json, os; json.dump(os.environ.copy(), sys.stdout)",
              ],
              capture_output=True,
          )
      )
      return cast(Dict[str, str], env)

  def environment_executor(self, command: List[str], environment: Dict[str, str]) -> str:
      # used as an EnvironmentExecutor to evaluate commands and capture output
      return str(self.call(command, env=environment, capture_output=True), errors="surrogateescape")


def shell_quote(path: PurePath) -> str:
  return shlex.quote(str(path))

docker_container_remote_test.py

import platform
import random
import shlex
import shutil
import subprocess
import textwrap
from pathlib import Path, PurePath

import pytest

from cibuildwheel.docker_container_remote import RemoteDockerContainer
from cibuildwheel.environment import EnvironmentAssignmentBash

# for these tests we use manylinux2014 images, because they're available on
# multi architectures and include python3.8
DEFAULT_IMAGE = "pypa/manylinux2014_%s:2020-05-17-2f8ac3b" % platform.machine()

@pytest.mark.docker
def test_simple():
  with RemoteDockerContainer(docker_image=DEFAULT_IMAGE) as container:
      assert container.call(["echo", "hello"], capture_output=True) == str.encode("hello\n")

@pytest.mark.docker
def test_no_lf():
  with RemoteDockerContainer(docker_image=DEFAULT_IMAGE) as container:
      assert container.call(["printf", "hello"], capture_output=True) == str.encode("hello")

@pytest.mark.docker
def test_environment():
  with RemoteDockerContainer(docker_image=DEFAULT_IMAGE) as container:
      assert (
          container.call(
              ["sh", "-c", "echo $TEST_VAR"], env={"TEST_VAR": "1"}, capture_output=True
          )
          == str.encode("1\n")
      )

@pytest.mark.docker
def test_cwd():
  with RemoteDockerContainer(
      docker_image=DEFAULT_IMAGE, cwd="/cibuildwheel/working_directory"
  ) as container:
      assert container.call(["pwd"], capture_output=True) == str.encode("/cibuildwheel/working_directory\n")
      assert container.call(["pwd"], capture_output=True, cwd="/opt") == str.encode("/opt\n")

@pytest.mark.docker
def test_container_removed():
  with RemoteDockerContainer(docker_image=DEFAULT_IMAGE) as container:
      docker_containers_listing = subprocess.run(
          "docker container ls",
          shell=True,
          check=True,
          stdout=subprocess.PIPE,
          universal_newlines=True,
      ).stdout
      assert container.name is not None
      assert container.name in docker_containers_listing
      old_container_name = container.name

  docker_containers_listing = subprocess.run(
      "docker container ls",
      shell=True,
      check=True,
      stdout=subprocess.PIPE,
      universal_newlines=True,
  ).stdout
  assert old_container_name not in docker_containers_listing

@pytest.mark.docker
def test_large_environment():
  # max environment variable size is 128kB
  long_env_var_length = 127 * 1024
  large_environment = {
      "a": "0" * long_env_var_length,
      "b": "0" * long_env_var_length,
      "c": "0" * long_env_var_length,
      "d": "0" * long_env_var_length,
  }

  with RemoteDockerContainer(docker_image=DEFAULT_IMAGE) as container:
      # check the length of d
      assert (
          container.call(["sh", "-c", "echo ${#d}"], env=large_environment, capture_output=True)
          == str.encode(f"{long_env_var_length}\n")
      )

@pytest.mark.docker
def test_binary_output():
  with RemoteDockerContainer(docker_image=DEFAULT_IMAGE) as container:
      # note: the below embedded snippets are in python2

      # check that we can pass though arbitrary binary data without erroring
      container.call(
          [
              "/usr/bin/python2",
              "-c",
              textwrap.dedent(
                  """
                  import sys
                  sys.stdout.write(''.join(chr(n) for n in range(0, 256)))
                  """
              ),
          ]
      )

      # check that we can capture arbitrary binary data
      output = container.call(
          [
              "/usr/bin/python2",
              "-c",
             textwrap.dedent(
                  """
                  import sys
                  sys.stdout.write(''.join(chr(n % 256) for n in range(0, 512)))
                  """
              ),
          ],
          capture_output=True,
      )

      for i in range(512):
          assert output[i] == i % 256

      # check that environment variables can carry binary data, except null characters
      # (https://www.gnu.org/software/libc/manual/html_node/Environment-Variables.html)
      binary_data = bytes(n for n in range(1, 256))
      binary_data_string = str(binary_data, encoding="raw_unicode_escape", errors="surrogateescape")
      output = container.call(
          ["python2", "-c", 'import os, sys; sys.stdout.write(os.environ["TEST_VAR"])'],
          env={"TEST_VAR": binary_data_string},
          capture_output=True,
      )
      output_string = ''.join(shlex.split(str(output, errors="surrogateescape")))
      assert output_string == binary_data_string
      assert output_string.encode(encoding="raw_unicode_escape", errors="surrogateescape") == binary_data

@pytest.mark.docker
def test_file_operations(tmp_path: Path):
  with RemoteDockerContainer(docker_image=DEFAULT_IMAGE) as container:
      # test copying a file in
      binary_data = bytes(random.randrange(256) for _ in range(1000))
      test_file = tmp_path / "test.dat"
      test_file.write_bytes(binary_data)

      dst_folder = PurePath("/tmp/test.dat/")
      container.copy_into(test_file, dst_folder)

      output = container.call(["cat", dst_folder / "test.dat"], capture_output=True)
      assert binary_data == output

@pytest.mark.docker
def test_dir_operations(tmp_path: Path):
  with RemoteDockerContainer(docker_image=DEFAULT_IMAGE) as container:
      binary_data = bytes(random.randrange(256) for _ in range(1000))
      original_test_file = tmp_path / "test.dat"
      original_test_file.write_bytes(binary_data)

      # test copying a dir in
      test_dir = tmp_path / "test_dir"
      test_dir.mkdir()
      test_file = test_dir / "test.dat"
      shutil.copyfile(original_test_file, test_file)

      tmp = PurePath("/tmp")
      dst_dir = tmp / "test_dir"
      dst_file = dst_dir / "test.dat"
      container.copy_into(test_dir, tmp)

      output = container.call(["cat", dst_file], capture_output=True)
      assert binary_data == output

      # test glob
      assert container.glob(dst_dir, "*.dat") == [dst_file]

      # test copy dir out
      new_test_dir = tmp_path / "test_dir_new"
      new_test_dir.mkdir()
      container.copy_out(dst_dir, new_test_dir)

      assert binary_data == (new_test_dir / "test_dir" / "test.dat").read_bytes()

@pytest.mark.docker
def test_environment_executor():
  with RemoteDockerContainer(docker_image=DEFAULT_IMAGE) as container:
      assignment = EnvironmentAssignmentBash("TEST=$(echo 42)")
      assert assignment.evaluated_value({}, container.environment_executor) == "42"

Dockerfile.tests.alpine

FROM alpine:3.15
# podman-docker is at 3.4.2 which currently which causes the 'cwd' test to break. Fixed in 3.4.3.
# https://github.com/containers/podman/issues/11842
# Privileged atm, but rootless!
RUN apk add python3 py3-pip py3-setuptools podman shadow shadow-uidmap
RUN adduser worker; usermod -v 10000-42767 -w 10000-42767 worker;
RUN chmod 644 /etc/containers/containers.conf; sed -i -e 's|^#mount_program|mount_program|g' -e '/additionalimage.*/a "/var/lib/shared",' -e 's|^mountopt[[:space:]]*=.*$|mountopt = "nodev,fsync=0"|g' /etc/containers/storage.conf; \
  sed -i -e 's|^unqualified-search-registries[[:space:]]*=.*$|unqualified-search-registries = ["quay.io", "docker.io"]|g' /etc/containers/registries.conf; \
  # Should we be remote by default?
  sed -i -e 's|^#[[:space:]]remote[[:space:]]*=.*$|remote = true|g' /usr/share/containers/containers.conf /etc/containers/containers.conf; touch /etc/containers/nodocker;
RUN mkdir -p /var/lib/shared/overlay-images /var/lib/shared/overlay-layers /var/lib/shared/vfs-images /var/lib/shared/vfs-layers; touch /var/lib/shared/overlay-images/images.lock; touch /var/lib/shared/overlay-layers/layers.lock; touch /var/lib/shared/vfs-images/images.lock; touch /var/lib/shared/vfs-layers/layers.lock; ln -sv /usr/bin/podman /usr/bin/docker
WORKDIR /home/worker/project
RUN mkdir -p /home/worker/.local/share/containers; chown worker:worker -R /home/worker; ln -sv /home/worker/project /project
USER worker
COPY --chown=worker ./setup* ./
RUN pip install -e '.[test]';
COPY --chown=worker . ./
ENV CONTAINER_HOST="unix:///tmp/podman-run-1000/podman/podman.sock"
ENV DOCKER_HOST="unix:///tmp/podman-run-1000/podman/podman.sock"

I also have a fedora Dockerfile that seems to work ok.

Not sure what the right way to add podman tests to the test suite is.

Someone's guidance would be greatly appreciated here as well.

If you are interested in my direction I could make a PR to your fork, or make a draft PR instead.

@Erotemic
Copy link
Contributor Author

@jshowacre I think docker-py might make sense, but it is another dependency. However, it would make maintaining this package easier if that logic was outsourced and maintained elsewhere.

I suppose it is up to the maintainers as to which direction to go, but IMO it is a good idea to incorporate docker-py (haven't looked at it in detail, I'm just assuming its well maintained and provides a nice generalized OCI API). But input from the maintainers would be useful.

@joerick
Copy link
Contributor

joerick commented Dec 20, 2021

On docker-py, I think that an implementation based on that would be quite a bit slower than our current implementation. A lot of the stream-based work in the current DockerContainer is because docker exec was too slow for cibuildwheel's use. Especially in the test suite we fire commands back-and-forth between Docker and Python very often, so this needs to be pretty fast. For discussion on this see #386 .

But I appreciate that you guys have been putting good work into this! So I've had a think this morning, as for the integration of podman into the codebase, here's my take on a design for this feature:

  • a CIBW_CONTAINER_RUNTIME_LINUX option, choices are docker,podman, default docker.
  • That switch controls if a DockerContainer or a PodmanContainer are used for the docker variable in linux.py (while we're at it, we should probably rename that variable to container).
  • Then the PodmanContainer class can be written to fulfil the same interface as DockerContainer - __enter__, __exit__, copy_into, call etc. Performance of this might not be quite so critical, so we might be able to get away with podman exec, which would probably be nicer as that keeps the implementation simple.
  • As for testing, we already have quite good unit tests of the DockerContainer implementation (unit_test/docker_container_test.py), so parameterizing this with PodmanContainer would provide pretty good assurance that the implementations are compatible. An option test, to ensure the option is parsed correctly. Then just a simple integration test build using CIBW_CONTAINER_RUNTIME_LINUX=podman would ensure that the new object fits into linux.py would be good.
  • As a side note, if we can figure out a way to get rid of that pesky sleep, that would be good. Those are the fixes that come back to haunt you in my experience!

@Erotemic
Copy link
Contributor Author

Erotemic commented Dec 20, 2021

@joerick Most of that sounds good.

Right now I'm calling the CIBW_CONTAINER_RUNTIME_LINUX option CIBW_OCI_EXE. Where OCI stands for open-container-initiative, as the idea would be that eventually more than docker and podman could be supported. I'm not a big fan of putting _LINUX as a suffix, nothing else seems to follow that pattern. Perhaps CIBW_CONTAINER_ENGINE would be more a more appropriate and clear name?

I'm also not sure if there should be a separate class for DockerContainer and PodmanContainer. The number of instances where there is need to differentiate between them is small, and I imagine updates to podman will reduce those further, as their goal is to eventually have podman become a drop-in replacement for docker. The main issue is that there needs to be a way for the user to specify extra-arguments to whatever the container engine is. This would handle the most cases in the most general way, but it might also make sense for cibuildwheel to supply sensible defaults depending on if the container engine is docker or podman.

I'm not sure how to get rid of the sleep. I agree it is a bad hack, but I'm not sure where to start debugging it. Perhaps it would be a good idea to expose a parameter that lets the user adjust the sleep time? That would provide the user with a workaround if it ever wasn't sufficient for a use-case, and it would allow for easier experimentation with disabling it / finding the root cause.

My question about testing podman on CI was less about how to write the unit tests (I did see docker_container_test), it was more about how do I create a CI environment where podman exists ensure podman exists on the CI: which CI do I add that to? github actions? CircleCI? TravisCI? All of them?

So, to summarize, the actionable questions I have are:

  • What name do we give what I'm currently calling CIBW_OCI_EXE? @joerick proposed CIBW_CONTAINER_RUNTIME_LINUX, but I'm thinking CIBW_CONTAINER_ENGINE might be better.
  • What are better names for the other parameters I've added? CIBW_OCI_EXTRA_ARGS_CREATE, CIBW_OCI_EXTRA_ARGS_COMMON, and
    CIBW_OCI_EXTRA_ARGS_START, and where do these get documented?
  • If CIBW_OCI_EXE is podman, should the above options default to the vfs / cgroup settings I've found to work well? Or if the user is using podman, should they be responsible for setting those?
  • What should the DockerContainer class be renamed to? Container? OCIContainer? Should there be a separate DockerContainer and PodmanContainer subclass (I personally think inheritance is not the right design in this case). Likewise, I imagine the docker_container.py module should be renamed similarly?
  • Should the sleep time be an option while it is an unresolved issue? Does acceptance of this PR hinge on discovering a proper solution to it?
  • Which CI providers should I add podman tests to? One or all of them?

EDIT:

@Erotemic
Copy link
Contributor Author

I added podman parameterization to docker_container_test.py, which adds a pytest.mark.parametrize for the kwargs used to create the DockerContainer instance. This first runs all the tests with docker, then with vanilla podman, and then with podman with cgroupfs/vfs container args (which is what I need to run cibuildwheel inside of docker on my gitlab CI).

Note: I'm going to hold off on making name changes on any existing code (I'll fixup names for new variables introduced) until things are finalized. This will help keep the diff smaller and easier to review. I might make a second PR on top of this that actually does the name changes, as that should be fairly mechanical.

Comment on lines 201 to 155
self.bash_stdin.close()

if self._sleep_time:
time.sleep(self._sleep_time)

self.process.terminate()
self.process.wait()

# When using podman there seems to be some race condition. Give it a
# bit of extra time.
if self._sleep_time:
time.sleep(self._sleep_time)

assert isinstance(self.name, str)

subprocess.run(["docker", "rm", "--force", "-v", self.name], stdout=subprocess.DEVNULL)
subprocess.run(
[self.oci_exe, "rm"] + self._common_args + ["--force", "-v", self.name],
stdout=subprocess.DEVNULL,
)
self.name = None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Erotemic The race condition was happening here you said? I tried:

self.bash_stdin.close()

assert isinstance(self.name, str)
subprocess.run(["docker", "rm", "--force", "-v", self.name], stdout=subprocess.DEVNULL)
self.process.terminate()
self.process.wait()
self.name = None

I copied your cwd changes and along with this, the unit tests and sample project seemed to work! Could you try as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still seems to break for me without the sleeps. The error I get is:

Error: read unixpacket @->/proc/self/fd/15/attach: read: connection reset by peer

                                                              ✕ 1.09s
Error: Command podman exec --cgroup-manager=cgroupfs --storage-driver=vfs --root=/root/.local/share/containers/vfs-storage/ -i cibuildwheel-19e7f1f5-ddd7-48f5-a130-7dc62c9776d1 tar -cC /output -f /tmp/output-cibuildwheel-19e7f1f5-ddd7-48f5-a130-7dc62c9776d1.tar . failed with code 255. None

Note, this is with the vfs and cgroupfs options:

          export CIBW_SKIP="pp* cp310-* cp39-* cp37-* cp36-* *musllinux*"
          export CIBW_OCI_EXE="podman"
          export CIBW_OCI_EXTRA_ARGS_CREATE="--events-backend=file --privileged"
          export CIBW_OCI_EXTRA_ARGS_COMMON="--cgroup-manager=cgroupfs --storage-driver=vfs --root=$HOME/.local/share/containers/vfs-storage/"
          export CIBW_OCI_EXTRA_ARGS_START="--events-backend=file --cgroup-manager=cgroupfs --storage-driver=vfs"
          cibuildwheel --output-dir wheelhouse --platform linux --archs x86_64

When I add them back in it goes back to working.

I further verified this by adding an oci-sleep-time option, so within the same test session I set the variable to export CIBW_OCI_SLEEP_TIME="0.0" and I again got an error:

ERRO[0000] container not running                        
ERRO[0022] Error forwarding signal 15 to container 56aa55d43bfc75cf39d65ca4a86037ddeb129f3fe7d780c6b3de8604b160d9f1: error sending signal to container 56aa55d43bfc75cf39d65ca4a86037ddeb129f3fe7d780c6b3de8604b160d9f1: `/usr/sbin/runc kill 56aa55d43bfc75cf39d65ca4a86037ddeb129f3fe7d780c6b3de8604b160d9f1 15` failed: exit status 1 

and then export CIBW_OCI_SLEEP_TIME="0.1", and again it worked.

Comment on lines 50 to 52
"oci_extra_args_common": f"--cgroup-manager=cgroupfs --storage-driver=vfs --root={home}/.local/share/containers/vfs-storage",
"oci_extra_args_create": "--events-backend=file --privileged",
"oci_extra_args_start": "--events-backend=file --cgroup-manager=cgroupfs --storage-driver=vfs",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these be set in a containers.conf in an appropriate location for you? The CLI arguments take the most precedence and I am a little uncomfortable with vfs being the near default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with containers.conf, it looks like it's a container-level thing and not a cibuildwheel level thing?

What is important to me is that cibuildwheel has an easy way to allow the user to modify how it uses containers to build your wheels. If that involves writing a file to my cwd with options instead of specifying then as environs, that's fine. But I don't want to make any system-level modifications. Everything should work locally.

How would one tell cibuildwheel to use a specific container configuration? Is it something you can pass to each Popen invocation of docker/podman commands? If it was something like:

export CIBW_CONTAINER_ENGINE=podman
export CIBW_CONTAINER_CONFIG=path-to-containers.conf

that would be fine with me, as long as all of the places where I'm using "_common_args" have an equivalent invocation that respects the given conf file.

Copy link

@jshowacre jshowacre Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you forward CIBW_CONTAINER_CONFIG as CONTAINERS_CONF for the podman invocation it should do just that, I believe. If not the precedence(so I don't forget) is: built-in -> /usr/share/containers/ -> /etc/containers/ -> ~/.config/containers/ (rootless) -> invocation

Right now I'm putting my 'soft' defaults in /etc/containers of the container since in rootless podman the user config has higher precedence. If you want to override the storage driver to be vfs, that's done inside the storage.conf, and the configuration file, if you don't wish to place inside your user config, may be overridden as CONTAINERS_STORAGE_CONF.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a better way to handle those special podman options. I'll have to learn about the syntax and options... I've spent a bit of time looking at this and this might take me some time. If anyone feels inclined to create a containers.json that corresponds to how I've setup a extra arguments I pass to docker/podman create, start, and other "common" commands, that would help a lot when I next have time to work on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is important to me is that cibuildwheel has an easy way to allow the user to modify how it uses containers to build your wheels.

Let's talk about this, because this isn't how cibuildwheel has approached problems like this before. The docker 'backend' has no options exposed to customise (beyond the existing env vars that docker itself reads). Why would users need to customise these options for podman?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in another reply, running podman inside docker, which is running on a host with stricter permission settings (I'll have to ask to figure out what these are, as I don't manage the gitlab-ci servers I use), need to run podman in "rootless" mode and I think there is some other issue with the overlay file driver, which is why the vfs driver is needed. But again, my understanding of containers is limited,

If I can determine how to get the CI working with only using existing environment variables that docker reads themselves, I'm perfectly happy to use those instead of relying on cibuildwheel (probably a better design decision anyway, although it would be nice if cibuildwheel had a note about how to configuring containers and then point at the docs to CONTAINERS_CONF and CONTAINERS_STORAGE_CONF). What I'm currently trying to figure out is what those config settings are. They unfortunately, do not correspond exactly to command line options, which is a design decision I always find to be frustrating.

@joerick
Copy link
Contributor

joerick commented Dec 21, 2021

  • What name do we give what I'm currently calling CIBW_OCI_EXE? @joerick proposed CIBW_CONTAINER_RUNTIME_LINUX, but I'm thinking CIBW_CONTAINER_ENGINE might be better.

The convention we've used in the past is that options that only affect the linux build are suffixed with _LINUX, at least in the documentation. But that's only the env var, in the pyproject.toml, the option would just be container-runtime = "docker". I don't have a strong preference between container-runtime and container-engine, I note that podman describes itself as a 'container engine' so maybe that's a better word, happy to use that instead.

  • What are better names for the other parameters I've added? CIBW_OCI_EXTRA_ARGS_CREATE, CIBW_OCI_EXTRA_ARGS_COMMON, and
    CIBW_OCI_EXTRA_ARGS_START, and where do these get documented?
  • If CIBW_OCI_EXE is podman, should the above options default to the vfs / cgroup settings I've found to work well? Or if the user is using podman, should they be responsible for setting those?

Ah, sorry, I wasn't clear. I was proposing removing these options, and using CIBW_CONTAINER_ENGINE as the switch for all of them. I had thought that your particular value of these options is required for podman to work in this scenario? I'd prefer to expose the minimum API surface area that we can get away with.

  • What should the DockerContainer class be renamed to? Container? OCIContainer? Should there be a separate DockerContainer and PodmanContainer subclass (I personally think inheritance is not the right design in this case). Likewise, I imagine the docker_container.py module should be renamed similarly?

This is the kind of thing that might be easier to work through in code. But I don't mind a bit of duplication if the code reads clearer.

  • Should the sleep time be an option while it is an unresolved issue? Does acceptance of this PR hinge on discovering a proper solution to it?

It probably doesn't hinge on it, but at least I'd like to understand why it's there a bit better.

  • Which CI providers should I add podman tests to? One or all of them?

Perhaps another job on GitHub Actions would do the trick.

@henryiii
Copy link
Contributor

Just to note, the problem with #906 would be fixable by users if we exposed more control over the API. Also, haven't looked into this lately, but I don't know if these should be normal options - they are backend only, maybe they would be environment only options. As just a single backend (podman/docker) selector, that would probably be okay to allow in toml, though I'd probably usually recommend it be CI specific unless you really only can build in podman. It's pretty easy to override a single option the other direction, too, so that's not that bad. That's why the "advanced" controls would be really bad in TOML, since you'd have to override all of them. Maybe that's a good argument against the "advanced" controls (assuming they are fixed).

@Erotemic
Copy link
Contributor Author

Erotemic commented Dec 21, 2021

I had thought that your particular value of these options is required for podman to work in this scenario?

It is not required if you are just running cibuildwheel on your host machine with podman as the only container engine/runtime. But, it is required for me to use podman in docker on my gitlab-ci instance. I don't quite understand the exact reason, I think it has to due with stricter host machine security settings. My expertise wrt to details of containers is spotty at best.

If I don't include the VFS / cgroup on my gitlab-ci instance I get:

time="2021-12-21T16:52:39Z" level=warning msg="Couldn't run auplink before unmount /var/lib/containers/storage/aufs/mnt/f98c1aa0bcc73fedb9642ba1b825491be8847e49da01ffca457958c8684d7730: exit status 22"
Error: unable to start container 91bec90b5d9777c8b4e9871aae3b0fb1314aece559e6e72ca26d747b98571f37: error mounting storage for container 91bec90b5d9777c8b4e9871aae3b0fb1314aece559e6e72ca26d747b98571f37: error creating aufs mount to /var/lib/containers/storage/aufs/mnt/f98c1aa0bcc73fedb9642ba1b825491be8847e49da01ffca457958c8684d7730: invalid argument

If I try to run in podman-in-docker on my local machine it does seem to work. I think the gitlab-CI has some sort of security mechanism on the host, which is forcing me to use the VFS / cgroup options (which for the record, I am still learning about, and I do not have a great understanding of).

I don't have a strong preference between container-runtime and container-engine, I note that podman describes itself as a 'container engine' so maybe that's a better word, happy to use that instead.

Either CIBW_CONTAINER_RUNTIME or CIBW_CONTAINER_ENGINE makes sense to me.

Ah, sorry, I wasn't clear. I was proposing removing these options,

Learning more about container configuration, I agree with removing those options.
Based on containers.conf.5.md I think it might make sense to have a CIBW_CONTAINERS_CONF and CIBW_CONTAINERS_STORAGE_CONF. Or I suppose we could rely on implicit behavior and just use the standard CONTAINERS_CONF and CONTAINERS_STORAGE_CONF environment variables, but I have a preference for having some way to pass that information around explicitly and coupled with CIBW itself. I can imagine this being very helpful to a newcomer who (like my past self) encounters problems with using cibuildwheel and does not realize that containers are configurable, or that the container configuration itself is the problem.

Just to note, the problem with #906 would be fixable by users if we exposed more control over the API.

If I understand correctly, setting --network=host should be possible in a configuration file. The question is, what is the best way to point cibuildwheel at it?


EDIT: So I've verified that most of the arguments I care about passing can be handled just by writing container and storage configuration files and then setting the environment variables: CONTAINERS_CONF and CONTAINERS_STORAGE_CONF.

          codeblock()
          {
              PYEXE=python3
              echo "$1" | $PYEXE -c "import sys; from textwrap import dedent; print(dedent(sys.stdin.read()).strip('\n'))"
          }
          # Make a storage.conf and containers.conf file
          #
          # https://github.com/containers/common/blob/main/docs/containers.conf.5.md
          # https://github.com/containers/storage/blob/main/docs/containers-storage.conf.5.md

          export CONTAINERS_CONF=$(realpath "temp_containers.conf")
          export CONTAINERS_STORAGE_CONF=$(realpath "temp_storage.conf")

          codeblock "
          [storage]
          driver=\"vfs\"
          graphroot=\"$HOME/.local/share/containers/vfs-storage\"
          runroot=\"$HOME/.local/share/containers/vfs-runroot\"
          " > $CONTAINERS_STORAGE_CONF

          codeblock "
          [containers]
          [engine]
          cgroup_manager=\"cgroupfs\"
          events_logger=\"file\"
          [machine]
          " > $CONTAINERS_CONF

          cat $CONTAINERS_CONF
          cat $CONTAINERS_STORAGE_CONF

However, it's a bit annoying because the names of the config options don't exactly correspond to the command line flags. It's not obvious how to construct one from the other. Furthermore, I don't see any way to invoke the --privileged when running the container "create" command. This might need to be exposed via CIBW if there is no way to set it in a configuration file (if anyone knows how to do this, please let me know).

I'm also getting an error: ERRO[0000] finding config on system: CONTAINERS_CONF file: stat temp_containers.conf: no such file or directory

I'm not sure why it's saying that, the file does exist. It turns out I needed to set the environment variables to an absolute path. Still need to figure out how to replicate --privledged in the config file.

If these difficulties can be overcome, my current thinking is that cibuildwheel probably should just document that these environs can be used to modify internal container behavior, and then just rely on implicit behavior.

@Erotemic
Copy link
Contributor Author

I've verified that I can use config files on my gitlab instance to get the behavior I'm looking for. That is good because it reduces the scope of this PR.

However, I'd still like to add a test which ensures cibuildwheel generally works with setting needed on more secured hosts. This involved expanding the docker_container_test.py such that it can write these config files and the pass environment variables to DockerContainer.

To obtain something like the effect of podman create with --privileged, I used the config files to specify extra default_capabilities and I changed the aufs mount options to be "rw". Still not 100% sure I have the config files in a 1-to-1 correspondence with the CLI params I passed earlier, but it does seem to work for me.

The main issue I'm running into here is that the temporary directory doesn't get cleaned up neatly on exit. The files written to the storage dir don't have user write permissions for whatever reason, and there are symlinks to /lib64 that seem to be causing issues. Hopefully these can be resolved.

@mayeut
Copy link
Member

mayeut commented Feb 6, 2022

@Erotemic, does terminating the container with something like that still exhibits a race condition ?

self.bash_stdin.write(b"exit 0\n")
self.bash_stdin.flush()
self.process.wait()
self.process.stdin.close()
self.process.stdout.close()

@joerick
Copy link
Contributor

joerick commented Apr 29, 2022

Apologies, I did slightly lose the thread in reviewing this. What's the status of this PR?

@Erotemic
Copy link
Contributor Author

I don't remember if I tried the workaround from @mayeut or not. I probably didn't. Looks like this at least needs a rebase.

Honestly, I'm a bit fuzzy on all of the details of podman. It's been awhile since I've used it actively. My motivation for working on this was to get my CI working and I've been using my fork (which is effectively this branch) successfully for over a year now. I'm forgetting the details on the race condition, and I can try the workaround, but even if that doesn't work I think this is probably good enough to merge given a rebase and re-review. I imagine this feature would be helpful to others and it would generalize cibuildwheel in a good direction (not locking into docker).

I may not have time to get to the rebase in the near future, so if anyone does need this urgently, I wouldn't mind some help getting the last issues fixed.

@Erotemic
Copy link
Contributor Author

I've gotten into another situation where I need podman support in cibuildwheel. I've rebased this branch on the lastest main. I'll work on fixing any issues that crop up.

I'll see if I can lookup and describe the race condition. It may just be something we have to live with as a known issue. It can be exclusively locked into podman execution though so docker users will be unaffected.

@joerick
Copy link
Contributor

joerick commented Jun 17, 2022

Another look into the race condition would be appreciated. And a general tidy up of this would be good too, if you're still looking to get it merged. This PR has been on a bit of a journey, I see some debug code still in there, and the tests are quite confusing to me. It would probably benefit from asking 'what are the minimum changes that get this working'. this is probably quite a niche feature, so it's more likely to be worth it to merge if it's not a lot of code. 🙏 thanks!

@brenthuisman
Copy link
Contributor

I for one depend on podman. docker is way too invasive, unsupported and thus unavailable in the envs I work in. Unfortunately alias docker=podman is not helping, the subprocess calls in cibuildwheel don't shell out.

I hope this can be merged in the not too distant future! Thanks :)

@Erotemic
Copy link
Contributor Author

Erotemic commented Jun 23, 2022

@joerick I've tidied up and removed the debug code. (and rebased)

The new test code is less daunting than it appears. Let me explain.

Changes in tests

In the tests, there is only one major change: adding the basis_container_kwargs, which we can rename or add extra documentation to if needed. The idea is it just generates different keyword arguments that can be used to construct an instance of the existing DockerContainer object. It yields a dictionary that correspond to a container engine and its configuration. I've added some more comments and removed commented debugging statements. The first two dictionaries yielded are the basic podman and docker configurations. I also added a VFS podman configuration, which is my primary use case.

The other added function is _cleanup_tempdir, which removes the test configurations. I put a comment in there indicating that I thought there might be a "better way" to do this with pytest itself, but I couldn't figure it out so I fell back on using atexit, which works fine.

The only other test changes are parameterizing the existing tests so they run with both podman and docker.

To support the tests, CI rules to install podman were added in test.yml and install_latest_podman.sh.

Changes in module code

Most other changes generally stem from adding the new container_engine option, which just holds the name of the executable using to run the containers: docker or podman. The default is "docker".

The majority of these changes are in the DockerContainer class. I've also noted the name DockerContainer is no longer appropriate, but I didn't want to introduce that change in this PR. I think it makes more sense to merge in the functionality, ensure it doesn't need to be reverted, and then gradually move terminology over to the generalized docker-agnostic setting.

As previously mentioned, there are a few incompatibilities between docker and podman, which is where all other changes stem from.

  • I've added the env parameter which is used in the tests.

  • Podman doesn't let you specify a cwd in the container create command, so we just add a command after the create to make the directory. We also add the ":Z" flag to the volume to support SELinux. I don't think this has any impact on exiting invocations (or at least the tests don't flag it).

  • The copy out procedure works differently in podman. I think docker might have some extra magic going on to make the concise command work, whereas podman needs the steps spelled out more explicitly. Suffice to say, the logic for docker is unchanged, so only podman is using the new codepath. I added some more comments explaining this different in the code.

  • I added a debug_info function, which could be removed, but I think it's helpful and worth keeping in.

  • The extra self.process.wait. This is the elephant in the PR. I have no idea why this is necessary, and it would be a lot of investigative work to drill down into it. Frankly I don't have the time to dedicate to this problem. It seems to avoid a race condition. Perhaps that is a problem with what we are doing, perhaps its a problem with podman itself. I think a good compromise is to leave it in, document why it is there, and only execute it in the podman case. My hope is that this issue doesn't derail the PR.

@joerick
Copy link
Contributor

joerick commented Jun 27, 2022

Thank you for your responses to feedback on this @Erotemic!

I've done a pass at this PR, I think it should be nearly ready to go.

Apologies for the diff explosion. I've ended up changing lots of references of 'docker' to 'container', which is language that appears throughout the project. It's probably a good thing anyway, since there were lots of occurrences of things like "copy files into the docker", which doesn't really make sense :P

The main files to look at are "oci_container.py", "oci_container_test.py", and "test_podman.py".

The only thing missing that I'm aware of currently is documentation for this new "CIBW_CONTAINER_ENGINE" option.

.github/workflows/test.yml Outdated Show resolved Hide resolved
cibuildwheel/options.py Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
test/conftest.py Show resolved Hide resolved
@joerick joerick merged commit 6fe0354 into pypa:main Jul 5, 2022
@joerick
Copy link
Contributor

joerick commented Jul 5, 2022

Merged. Thank you @Erotemic for persevering with this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants