Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Support for multi-arch #63

Closed
wants to merge 1 commit into from
Closed

[WIP] Support for multi-arch #63

wants to merge 1 commit into from

Conversation

colek42
Copy link

@colek42 colek42 commented Jan 16, 2020

What type of PR is this?
/kind feature

What this PR does / why we need it:
Support CSI on multiple arch

Special notes for your reviewer:
I am proposing that we move the build logic out of the makefile and into the multi-stage Dockerfile. This will help us support multiple archtectures easily using docker buildx and ensure the build environment is reproducable.

This PR is a POC on how that may work. Additional changes to the build process will be required. See https://hub.docker.com/r/colek42/csi-node-driver-registrar/tags for the built images

ref: #55 #48

Does this PR introduce a user-facing change?:

Add multi-arch image support

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Jan 16, 2020
@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
  • If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
  • Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 16, 2020
@k8s-ci-robot
Copy link
Contributor

Welcome @colek42!

It looks like this is your first PR to kubernetes-csi/node-driver-registrar 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-csi/node-driver-registrar has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @colek42. Thanks for your PR.

I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: colek42
To complete the pull request process, please assign vladimirvivien
You can assign the PR to them by writing /assign @vladimirvivien in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jan 16, 2020
@colek42
Copy link
Author

colek42 commented Jan 16, 2020

i signed it

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jan 16, 2020
@colek42
Copy link
Author

colek42 commented Jan 16, 2020

@colek42
Copy link
Author

colek42 commented Jan 16, 2020

ref: ceph/ceph-csi#671

@msau42
Copy link
Collaborator

msau42 commented Jan 17, 2020

/ok-to-test
/assign @pohly

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 17, 2020
@pohly
Copy link
Contributor

pohly commented Jan 17, 2020

The proposal makes sense to me in principle. But there are some details to clarify:

  • Does that also work in Prow or only in TravisCI? We wanted to move away from TravisCI, it would be a step back to add more reasons why we have to use TravisCI.
  • Where and how do we choose the Go version? Right now it's specified in exactly one place (release-tools/travis.yml) and more precise with major.minor.patch version than the FROM golang:1.13 in the Dockerfile.
  • Do we keep "make build" as-is (i.e. build with Go on the host) or do we replace it with building in Docker? This is mostly relevant for developers.
  • Do we need a .dockerignore file? Without it, ADD . /app may end up copying quite a bit of data from an unclear work directory when invoked by a developer. In PMEM-CSI, we blacklist everything and then whitelist just the parts that are needed for building (https://github.com/intel/pmem-csi/blob/devel/.dockerignore). This is sometimes a nuisance to maintain, though.

#export DOCKER_CLI_EXPERIMENTAL=enabled
#docker run --rm --privileged docker/binfmt:66f9012c56a8316f9244ffd7622d7c21c1f6f28d
#docker buildx create --use --name mybuilder
#docker buildx build -t colek42/csi-node-driver-registrar --platform=linux/arm,linux/arm64,linux/amd64 . --push
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docker Buildx is included in Docker 19.03 see https://docs.docker.com/buildx/working-with-buildx/ also we need to make sure that are we able to push this image to quay.io as csi is using quay.io

Copy link
Author

@colek42 colek42 Jan 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like Quay supports the manifest API

ref: https://coreos.com/quay-enterprise/releases/#1.15.2
ref: https://docs.docker.com/registry/spec/manifest-v2-2/

  • Does that also work in Prow or only in TravisCI? We wanted to move away from TravisCI, it would be a step back to add more reasons why we have to use TravisCI.

I don't understand Prow, where can I find more info about it? If Prow has a current verion of Docker and allows us to use buildx it should work.

  • Where and how do we choose the Go version? Right now it's specified in exactly one place (release-tools/travis.yml) and more precise with major.minor.patch version than the FROM golang:1.13 in the Dockerfile.

If we are building in the container, I think it makes sense to specify the Go version in the
Dockerfile.

  • Do we keep "make build" as-is (i.e. build with Go on the host) or do we replace it with building in Docker? This is mostly relevant for developers.

I always reccomend building in a container for reproducabilty. If developers need the same functionality we can have the Makefile call docker build and mount a local folder into /bin

  • Do we need a .dockerignore file? Without it, ADD . /app may end up copying quite a bit of data from an unclear work directory when invoked by a developer. In PMEM-CSI, we blacklist everything and then whitelist just the parts that are needed for building (https://github.com/intel/pmem-csi/blob/devel/.dockerignore). This is sometimes a nuisance to maintain, though.

The pattern in PMEM-CSI looks good to me, I don't mind doing it either way - the tradeoffs either way are pretty minimal. Because we are using a multi-stage buid the end result should be the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand Prow, where can I find more info about it? If Prow has a current verion of Docker and allows us to use buildx it should work.

buildx depends on QEMU for simple multi-platform support, right? That may create additional dependencies for the host that the job runs on. Prow is the CI for Kubernetes, running on Kubernetes itself: https://github.com/kubernetes/test-infra/blob/master/prow/README.md

If we are building in the container, I think it makes sense to specify the Go version in the
Dockerfile.

I don't mind that. I was just pointing out that if we move it, then further work is needed to adapt the other code that currently reads the Go version from travis.yml. My concern also was that 1.13 is less precise than what we are currently using. I don't know whether it matters in practice, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently, quay.io doesn't support multi-arch see ceph/ceph-csi#707 (comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pohly Since quay doesn't support multi-arch, could there be a possibility to move to gcr?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that has been the goal for a while now. We haven't done it yet because it wasn't clear how Prow jobs would get access to the necessary credentials for pushing to gcr. I believe that has been sorted out now.

Can you check whether docker buildx really works in a Prow job for multiple architectures? You can change the release-tools/build.make for that (for example, let test depend on a new test-buildx which runs the command), then we'll see during the next pull-kubernetes-csi-node-driver-registrar-unit run whether it succeeds.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd like to get multi-arch builds off of prow and onto the GCB based image auto push infra so as to not need to register quemu on the nodes (and avoid different jobs stomping it).

I'm personally too oversubscribed for this, but the CNCF infra is generally moving that way. @justaugustus and @Katharine have some work here.

@mkumatag
Copy link

/cc @listx @claudiubelu @BenTheElder

@k8s-ci-robot
Copy link
Contributor

@mkumatag: GitHub didn't allow me to request PR reviews from the following users: claudiubelu, BenTheElder, listx.

Note that only kubernetes-csi members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @listx @claudiubelu @BenTheElder

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@claudiubelu
Copy link

claudiubelu commented Feb 12, 2020

/cc @listx @claudiubelu @BenTheElder

You'll have to specify what kind of information you need from me. :)

But I'll assume it's Windows-related.

It seems that you're only taking into account the Linux container images. The Dockerfile is different due to a few reasons (see Dockerfile.windows).

And from what I understand, you want to move the build process from the host level to the container level (using golang:1.13 images), all being with a single docker buildx command - from building, publishing, manifest list. I'm not sure that docker buildx can take into account multiple Dockerfile files.

As far as binary building goes, it seems that golang:1.13 has support for Windows Server 2019 (1809), so we could use that to build the necessary binaries. I think we can add a Windows node to docker buildx, so that it will use that node to build the appropriate image.

But there's a concern I have at the moment. Currently, from what I've seen in the Dockerfile.windows file, we're only building images for Windows Server 2019 (1809), while we currently also support 1903 and 1909. Now there's something you'll need to know: in order to use a 1809 container image on a 1903 or 1909 node, the node will have to have Hyper-V enabled, and the containers to be run with --isolation-hyperv, which is not a good assumption to make; for example, GCE doesn't have Hyper-V enabled (AFAIK), so it's not viable. Which means that most probably we'll have to build multiple container images for Windows: for 1809, 1903, 1909, which means different image bases. But I don't know if docker buildx can support such a usecase, to build images for multiple OS versions for the same os/arch.

But even so, we'll still have to make an assumption: that the Windows build node is able to build container images for 1809, 1903, and 1909, which means that it has to have Hyper-V enabled, it's docker to have --isolation=hyperv configured, and it's at least a 1909 node (using Hyper-V isolation, it only ensures backwards compatibility - you can use 1809, 1903, 1909 container images on a 1909 node, but you can't use a 1909 image on a 1809 node). I guess this could be the builder's responsability to ensure this, but if we're going to have an automatic image builder for this at one point, you won't be able to do it for Windows on GCE (as I said, it doesn't have Hyper-V enabled, AFAIK), and I don't think you can have multiple Windows nodes with different OS versions build the container images for their respective versions; I don't think docker buildx is able to figure out which Windows node to use for which image based on a version.

To be fair, I'm not too familiar with docker buildx and I might be wrong about some of the points above, but it might not be answer we seek to treat all the cases above.

@colek42
Copy link
Author

colek42 commented Feb 18, 2020

I'm not sure that docker buildx can take into account multiple Dockerfile files.

I don't think buildkit supports Windows period. ref: moby/buildkit#616 Even if it does in the near future it is likely to complicate the PR significantly. I have customers today looking for ARM CSI support.

I think the initial hurdle is moving to prowl, and gcr. When that is taken care of adding Multiarch support will be straight forward.

Proposed way forward:

  1. Move images to GCR
  2. Move to prowl
  3. Multi-arch support via buildx and multi-stage docker files.

@travisghansen
Copy link

I also would like to see this for arm support as well. Unfortunately I don't know anything about prowl or how to integrate with it but otherwise am happy to help. Thanks!

@BenTheElder
Copy link

sorry but I just do not have bandwidth for this right now.

for image pushing please sync up with k8s-infra-wg and look at the image pushing jobs https://github.com/kubernetes/test-infra/tree/master/images/builder, this tooling can be used with a wg provided GCR / GCB / ... project, on which quemu etc. should work fine. Other members of the kubernetes release group have experience with this too and use this pattern for other official images.

/uncc

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 18, 2020
@justaugustus
Copy link
Member

/remove-lifecycle stale

Instructions on configuring image building/pushing via Prow + GCB can be found here: https://github.com/kubernetes/test-infra/tree/master/config/jobs/image-pushing

kube-cross image building is a decent example to crib from: https://github.com/kubernetes/release/tree/master/images/build/cross

Let me know if you need additional help and feel free to assign me in PRs if you need reviews! :)

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 21, 2020
@claudiubelu
Copy link

If the original author can't continue this, I could add the necessary bits, like I've done here: kubernetes-sigs/secrets-store-csi-driver#189

@justaugustus
Copy link
Member

@claudiubelu -- I'd say give it a few days to see if there's a response before taking it over.
If you do end up taking up the work, please attribute @colek42 using the Co-authored-by: byline --> https://help.github.com/en/github/committing-changes-to-your-project/creating-a-commit-with-multiple-authors

@colek42
Copy link
Author

colek42 commented May 21, 2020

@claudiubelu I'd like to take a shot at this. Right now I am aiming for a Tuesday PR. Thank you for the ref to your work.

@claudiubelu
Copy link

Great! For the most part, you can build the images in the same way I did in the PR. One key aspect is the manifest-tool usage, which allows us to also add the os.version field for an image in a manifest list, which is crucial for Windows nodes that does not have Hyper-V enabled for Hyper-V isolation. Without Hyper-V Isolation, the container OS version and the host OS version needs to match in order to start the container, and in order to properly pull the right image from the manifest list, that os.version field is crucial.

Just keep in mind to also submit the PR which creates the prow job, like this one: kubernetes/test-infra#17335

@msau42
Copy link
Collaborator

msau42 commented May 21, 2020

Hey everyone, thanks for all the attention on this. We have already been actively working on this, and have some experimental builds already going in k8s-staging-csi. Please reach out to @vitt-bagal @pohly @jingxu97 and @dims to coordinate efforts.

@colek42
Copy link
Author

colek42 commented May 21, 2020

@msau42 could you provide a reference to the k8s-staging-csi repo please.

@msau42
Copy link
Collaborator

msau42 commented May 21, 2020

We have a WIP cloudbuild and the prow job and prow status

@pohly
Copy link
Contributor

pohly commented May 25, 2020

Can we move over tracking of multi-arch builds to this issue: kubernetes-csi/csi-release-tools#86

The problem is not specific to node-driver-registrar, although that is where we are currently experimenting.

@pohly
Copy link
Contributor

pohly commented May 25, 2020

Can we move over tracking of multi-arch builds to this issue

On the other hand, this issue has more context. I'm undecided...

@pohly
Copy link
Contributor

pohly commented May 25, 2020

On the other hand, this issue has more context. I'm undecided...

This PR... I was confused. We really shouldn't use a PR to discuss progress that is being made elsewhere now. I suggest we close this.

@colek42
Copy link
Author

colek42 commented May 27, 2020

I am quite confused about who is handling what at this point. It seems like most of the work has been done.

@pohly
Copy link
Contributor

pohly commented May 27, 2020

I am quite confused about who is handling what at this point. It seems like most of the work has been done.

There have been various people who experimented, most recently @vitt-bagal and @namrata-ibm. I'm now trying to get it from "POC" to "usable in practice". #84 hopefully will be the last PR before moving the common code and files into csi-release-tools.

One noteworthy difference compared to this PR: we can't do multi-stage builds with native compilation for Windows, which diminishes the value of doing it for Linux because maintaining two different ways of producing images isn't nice. Therefore #84 keeps building binaries on the build host outside of Docker and then just puts those into the image with a simple Dockerfile + COPY.

@colek42
Copy link
Author

colek42 commented May 27, 2020

@pohly That seems appropriate. Feel free to ping me if the team decides to move in another direction

@colek42 colek42 closed this May 27, 2020
@claudiubelu
Copy link

One noteworthy difference compared to this PR: we can't do multi-stage builds with native compilation for Windows, which diminishes the value of doing it for Linux because maintaining two different ways of producing images isn't nice. Therefore #84 keeps building binaries on the build host outside of Docker and then just puts those into the image with a simple Dockerfile + COPY.

You don't need native building in the first place. You can do something like this: https://github.com/kubernetes-sigs/secrets-store-csi-driver/blob/master/windows.Dockerfile

As you can see in the file, the RUN part is in the Linux stage, and then the binary is just copied over inside the Windows image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.