Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced docker runtime #2167

Open
wants to merge 8 commits into
base: release/v1
Choose a base branch
from

Conversation

richbai90
Copy link

@richbai90 richbai90 commented Jun 16, 2022

What does this change

Add additional options to the docker extension within porter to permit the configuration of mounts, network types, and kernel capabilities. The new configuration follows the API established in the go-cnab package. The one downside to this is that the new configuration options use the proper case format go uses to identify package exports. This seems unavoidable until the go-cnab package updates the driver configuration on their side to include appropriate tags to identify the fields in their JSON naming convention.

The change adds additional options to the docker extension. The complete configuration is featured below, however some options cannot be specified with others according to the docker driver spec as implemented by the go-cnab package.

required:
  docker:
    privileged: false
    mounts:
    - Type: ''
      Source: ''
      Target: ''
      ReadOnly: false
      Consistency: ''
      BindOptions:
        Propagation: ''
        NonRecursive: false
      VolumeOptions:
        NoCopy: false
        Labels:
          '': ''
        DriverConfig:
          Name: ''
          Options:
            '': ''
      TmpfsOptions:
        SizeBytes: 0
        Mode: 0
    network: ''
    capadd:
    - c1
    - c2
    capdrop:
    - c3
    - c4
    portBindings:
    - '80':
      - HostIp: 0.0.0.0
        HostPort: '8080'
    restartPolicy:
      Name: ''
      MaximumRetryCount: 0

What issue does it fix

This is in reference to #2125. It doesn't address the proposal as such, but is a proposal of how the docker runtime environment could be expanded as per the comment #2125 (comment)

Notes for the reviewer

In this case no changes had to be made to to the docker driver defined in the go-cnab package since that driver exposes all the docker api options provided by the official docker package. Porter just had to be extended to expose additional options through its configuration options.

Checklist

  • Did you write tests?
  • Did you write documentation?
  • Did you change porter.yaml or a storage document record? Update the corresponding schema file.
  • If this is your first pull request, please add your name to the bottom of our Contributors list. Thank you for making Porter better! 🙇‍♀️

Reviewer Checklist

  • Comment with /azp run test-porter-release if a magefile or build script was modified
  • Comment with /azp run porter-integration if it's a non-trivial PR

@carolynvs
Copy link
Member

It looks like the bundle defines both the mount point into the container AND the source from the host. Am I misunderstanding or how does the user specify what they want to expose to the bundle?

@richbai90
Copy link
Author

That is correct. This has the implication that the source has to be the same on both the build and deployment environments. In my particular case I think that's probably okay, but it shouldn't be too difficult to add a CLI option to overwrite the path on the host if that's a more desirable implementation.

@carolynvs
Copy link
Member

It's a hard requirement in Porter that bundles do not have knowledge of the host machine. So Porter needs a way to define mappings from what the bundle has declared (volumes, mounts, etc) and how those should be configured by the host when running the container.

In general, we can't optimize for "in-house" bundles. Any new feature needs to work when the bundle author and the people running the bundle are different entities. For example, a software vendor has a bundle that installs their product and their users should be able to run it without hacks or mimicing the vendor's environment.

@carolynvs
Copy link
Member

I recommend working out and getting consensus on how this feature should work before putting more time into a PR. Let's figure out how we want the bundle author to define additional configuration, if it's required, etc and then how the person running the bundle would be able to learn what configuration is required, what their options are (i.e. hey this bundle exposes a mount point), and how they would configure it when running the bundle.

@carolynvs carolynvs self-assigned this Jun 21, 2022
@richbai90 richbai90 force-pushed the advanced-docker-runtime branch 2 times, most recently from cf4a6c8 to 3bd1596 Compare June 21, 2022 23:53
@richbai90
Copy link
Author

That sounds great. I have put some effort into modifying the implementation to accept parameters. It's not quite working. I can tell that the template strings are being matched because they're being replaced with an empty string, but I am obviously missing something about how the parameter values are retrieved. You can take a look at the changes on my repo: richbai90#1 . If I can get a little more direction, I think I can make this work, if it seems like a positive direction to you.

@carolynvs
Copy link
Member

Sorry, I am having trouble telling what you are trying to do in that PR. One of the reasons why this wasn't implemented previously is because it requires an agreed upon design for how a bundle author specifies mounts, ideally in a way that works with our existing patterns (parameters) and can be promoted to the CNAB spec. It shouldn't just be "we are exposing docker functionality without an abstraction layer, since long term porter will support more than k8s and docker, for example virtual machines or web assembly modules.

Parameters let the bundle define what it needs, and then gives the end-user of the bundle flexibility to choose how to supply that parameter, such as providing it from a secret vs an environment variable. Porter defines special parameter types that simplify passing data into bundles as parameters. Right now we support string, bool, number, file, etc.

What if we added a new parameter type: directory and then the user could choose to satisfy it with a volume mount?

# porter.yaml
parameters:
- name: nginx-config
  type: directory
  path: /etc/nginx
  mode: read # or read-write if we are writing back to the host

Then when the end-user of the bundle checks out the bundle to see how they can customize how it's installed they would see something like this:

$ porter explain -r MYBUNDLE
Name: mybundle
Description: An example bundle that mounts a directory
Version: 0.1.0
Porter Version: v1.0.0-alpha.19

Parameters:
-----------------------------------------------------------------
  Name           Description  Type       Default  Required  Applies To
-----------------------------------------------------------------
  nginx-config                directory           true      All Actions

Then the user has their choice of how they want to provide that parameter.

# pass in the contents of /tmp/myconfig from the host
porter install --param nginx-config=/tmp/myconfig
name: mybundle-params
parameters:
- name: nginx-config
  source:
    # could specify whatever the docker --mount flag supports
    # target isn't required since Porter can determine that info from the parameter definition
    mount: source=/home/myuser/myconfig 

Another design considering is how we would deal with file permissions and the owner/group on any files created. Porter runs as a random nonroot user, and we would need to think through how to avoid predictable problems caused by the container writing files to the host as that user.

This isn't everything that needs to be taken into account in a design but I hope that gives you an idea of what we need before the proposal can be a accepted, and then an implementation considered.

@richbai90
Copy link
Author

richbai90 commented Jul 5, 2022

The goal was to be able to include parameters in the extension definition the same way they are used in the mixin definitions. For example:

required:
  docker:
    mounts:
      - source: {{ mySource }}

I'm good with however you feel the best way to implement this is. My thought was eventually to add a new runtime key to specify the runtime to use and the config options that made sense with that specific runtime. In a similar way to how I'm abusing the required extension currently. Something like this:

runtime:
  docker:
    mounts:
      - source: {{ mySource }}

As for the owner/group considerations, you can see how I'm using the setup currently and handling these considerations with my file-xfer mixin. Basically, the mixin works in four steps:

  1. Configuration provides a source folder or volume to use during build and a desired destination on the install machine
  2. A pre-build step archives the files in the source folder or volume and copies it to the build directory
  3. During build that archive is copied to the runtime container and the group/user properties of the archive are updated to be the nonroot user
  4. During install a mount is created at the provided destination on the target machine and the archive is unpacked in the target directory.

The archive method avoids the problems of ownership changes of the individual files because only the archive owner changes during the build process. During the install process the available buildargs can be used to appropriately set the nonroot user uid and gid, or a second exec step to update the extracted files ownership properties.

@richbai90
Copy link
Author

What if we added a new parameter type: directory and then the user could choose to satisfy it with a volume mount?

This makes a lot of sense to me. I can work on a PR to that affect.

@carolynvs
Copy link
Member

required:
  docker:
    mounts:
      - source: {{ mySource }}

With the solution I proposed above, using parameters to mount volumes, this part shouldn't be needed anymore. Do you agree?

@carolynvs
Copy link
Member

I'm not too concerned about how the xfer mixin deals with file permissions. Rather it's the general case of a bundle mounting a volume and then writing files to it that needs further planning. Porter will need to provide strong guidance on how to interact with a volume, backed with helpers that make it easy to just do the right thing without reimplementing it in each mixin/bundle.

For example, if I have a bundle that mounts a volume from the host, and then I write to that volume with the exec mixin (like in a bash script), what is the desired experience for the bundle author so that it's super easy for them to get the permissions right? We want to avoid making it easy for bundle authors to mess up someone's file permissions because we've made it 100% their responsibility.

I think we have a bunch of options for how to go about this (off the top of my head):

  • Have Porter fix up the file permissions on the volume using additional metadata from the bundle (TBD), so that bundles are not responsible for fixing the uid/gid.
  • Provide a built-in mixin that just cleans up the volume ownership, and the bundle author can specify the uid/gid (i.e. just wrap up what would be a custom script in each bundle into a reusable mixin).
  • Expose the uid/gid build args as environment variables so that the bundle author can use them in scripts when the bundle runs.

I strongly lean towards the first option if possible, since Porter is somewhat responsible for not giving bundle authors a "foot gun" to mess up permissions on people's machines. 😀

@richbai90
Copy link
Author

richbai90 commented Jul 6, 2022

All of this sounds very reasonable. To make sure I understand, the current consideration is something like this.

  • As a bundle author I will be able to require a directory parameter with a path and mode.
  • As a bundle author I will be able to expect that the resolution of the directory parameter will cause the path I specify to be populated.
  • As a bundle author I will be able to specify the owner and group of all files written to the host
  • As a bundle author I will be able to request that the end user specify the owner and group of all files written to the host
  • As an end user I will be able to fulfill the directory parameter with a mount configuration corresponding to the options available via the docker cli mount flag.
  • As an end user I will, in the future, be able to fulfill the directory parameter with other configuration types as they become available.
  • As an end user I will be able to maintain security in my environment by specifying the access level available to the runtime container.
  • As an end user I will be able to maintain security in my environment by assurance that access will only be granted to the directories I specify in a directory parameter, and that this will in no case create a back door to the host file system.

Does that sound more or less like what we're looking for in terms of features?

Regarding this feature:

As a bundle author I will be able to specify the owner and group of all files written to the host

I waffled some on whether to include it, or if the end user should always have control over the files written to their machine. I determined to include it for instances where the author or developer knows best. For instance, if the bundle is expected to interact with the alsa kernel module, there is a good chance that any files the bundle writes to the host will need to be owned by the audio user -- or equivalent depending on the OS. The end user doesn't know that, nor should they be required to. On the other hand, if the bundle is installing something in a custom location, only the end user will know who should own those files.

Does it make sense to only allow bundle authors to specify system users/groups as file owners? IE put a validation step in the porter install command that checks if the user/group exists and is a system user/group.

@richbai90
Copy link
Author

With the solution I proposed above, using parameters to mount volumes, this part shouldn't be needed anymore. Do you agree?

I do agree. I think this is the best abstraction. It should work regardless of runtime used, and doesn't expose unnecessary, potentially dangerous docker configuration options. It also abstracts the details of implementation away from bundle authors who shouldn't need to be concerned with them.

@carolynvs
Copy link
Member

that this will in no case create a back door to the host file system.

Agreed, as long as the bundle does not mount the docker socket (using requires docker). If the end user runs a bundle with --allow-docker-host-access, honestly that is a security concern that we call out on that flag and is why the end user has to explicitly allow it. But if they are only using the directory parameter, then yes their access to the rest of the host is limited to just the volumes mounted.

As a bundle author I will be able to specify the owner and group of all files written to the host
As a bundle author I will be able to request that the end user specify the owner and group of all files written to the host
...
For instance, if the bundle is expected to interact with the alsa kernel module, there is a good chance that any files the bundle writes to the host will need to be owned by the audio user -- or equivalent depending on the OS. The end user doesn't know that, nor should they be required to. On the other hand, if the bundle is installing something in a custom location, only the end user will know who should own those files.

I really lean towards having a default uid/gid defined on the parameter, and then porter has a way for the end user to always be able to override that if necessary. That way the bundle author can suggest a default, but if there's a mismatch, the end user should ultimately always have control over what is written to the host volume.

# porter.yaml
parameters:
- name: nginx-config
  type: directory
  path: /etc/nginx
  writable: true # Replaces my earlier suggestion of read|read-write, defaults to false
  gid: 1001 # The bundle author can default it to a well-known user/group
  uid: 1001

So if the user has a custom setup, and needs a different user, they could run

porter install --param nginx-config-gid=1002 --param nginx-config-uid=1002

The bundle would have access to these additional gid/uid parameters if the bundle wants to set owners (not necessary but for situations where the underlying tools requires that info), but Porter would always do a sweep at the end to enforce ownership on all shared volumes.

Does it make sense to only allow bundle authors to specify system users/groups as file owners? IE put a validation step in the porter install command that checks if the user/group exists and is a system user/group.

I'm not sure I see how whether or not the user is flagged as a system user is helpful here and it most likely assumes too much about how people organize their users/groups, and assumes Linux (we don't support Windows containers now but could in the future).

@carolynvs
Copy link
Member

Thanks for the use cases listed out like that! 👍

@richbai90
Copy link
Author

Yeah no sweat! That's how we write stories where I work. I'll get to work on a PR to this affect.

@richbai90 richbai90 mentioned this pull request Jul 20, 2022
4 tasks
@richbai90
Copy link
Author

I have a mostly working implementation of what we discussed on #2251. I am not sure where to look for the parameter default overwriting via the command line logic. That is what is left not working currently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants