Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EP] Remote Artifacts #419

Merged
Merged
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
313 changes: 313 additions & 0 deletions docs/proposals/remote-artifacts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,313 @@
<!--
Copyright The Shipwright Contributors

SPDX-License-Identifier: Apache-2.0
-->

---
title: remote-artifacts
authors:
- @otaviof

reviewers:
- @SaschaSchwarze0
- @qu1queee
- @adambkaplan
- @zhangtbj
- @coreydaley

approvers:
- TBD
creation-date: 2020-10-02
last-updated: 2020-11-09
status: provisional
---

Shipwright Remote Artifacts
otaviof marked this conversation as resolved.
Show resolved Hide resolved
---------------------------

# Summary

Remote artifacts are dependencies of the software building process, they represent binaries or
other data stored outside of the version control system (Git). Hence, they are required when
dealing with container images as well, and this enhancement proposal focuses on adding a remote
artifacts support to Shipwright's Build API.

# Motivation
qu1queee marked this conversation as resolved.
Show resolved Hide resolved

Give Shipwright's operator broader build use-case support by enhancing its capabilities to include
the concept of Artifacts. In other words, remote entities that will be available for the image
build process.

End users will be able to include remote artifacts as a build runtime dependency, artifacts
controlled by remote systems will be downloaded before the build process starts. Those artifacts may
be a pre-compile binary, `jar` files, `war` files, etc.

## Goals

* Provide means to declare remote artifacts, dependencies that can be employed on builds;
* Create the mechanism to download and prepare remote artifacts for builds;

## Non-Goals

* Automate the upload of local artifacts into the cluster;
* Manage remote artifacts;
* Amend container images created by the given `BuildStrategy`;
* Walking a remote directory tree, similarly to a `git checkout`;

# Proposal

The enhancement proposal is centered around the idea of declaring external dependencies, here
called "artifacts", and being able to use them during the container image building process.

## Build Artifacts

The remote artifacts will be directly expressed in the body of a `Build` resource, as the
following example:

```yml
---
apiVersion: build.dev/v1alpha1
otaviof marked this conversation as resolved.
Show resolved Hide resolved
kind: Build
metadata:
name: license-file
spec:
source:
# ...
sources:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if sources really makes sense here. I think of source as a git repository or something to that affect. Maybe this should be something more along the lines of dependencies like you would list in a maven file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote is for inputs which would be a symmetric naming together with the output image that we already have in the Build.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SaschaSchwarze0 are you okay with sources and the path ahead to unify them later on?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SaschaSchwarze0 are you okay with sources and the path ahead to unify them later on?

I am okay with sources, yes. (But I doubt we would ever change this in the future. ;-) )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I´m concerned more than one person is not fully buying the spec.sources , what are the implications for this EP to move into spec.inputs rather than spec.sources, it will also keep the API clean for now. And in another EP we will deal with multiple gits, where we can tackle that modifications to spec.source.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay to use .spec.inputs for this purpose. Do we have objections on .spec.inputs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not from my side :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbose78, @adambkaplan, @gabemontero, @coreydaley would you object to use .spect.inputs instead?

- name: license-file
type: http
url: https://licenses.company.com/customer-id/license.tar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use a more realistic example like a jar artifact in all examples, please?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial text was using a Jar file, but it wind up in questions about the dependency management itself. So, moved that to a simpler thing, a license file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note a more Java like example is kept here.

http:
path: $(workspace)/licenses/license.tar
```

We will add `sources` section, side by side with current `source`. The idea is to accommodate both
constructions in `v1alpha1`, and save API breaking changes for upcoming `v1beta1`.

The new `sources` will contain the following attributes:

- `name`: source name (required);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like name is superfluous here, what is the point of having it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so we can use the merge patch strategy. Otherwise kubectl apply will default to overwriting everything in the sources array.

https://kubernetes.io/docs/tasks/manage-kubernetes-objects/update-api-object-kubectl-patch/#notes-on-the-strategic-merge-patch

- `type`: input source type, `git` or `http` (required);
- `sourceRef`: use a external resource, the name in this field is the resource name (optional);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand upon this fields usage?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm moving the field to the new "Alternatives" section at the end.

- `url`: the URL of the repository or remote artifact (optional);
- `credentials`: the credentials to interact with the artifact (optional);
- `http`: settings for `http`, namely `path` (optional);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is path for here? Is this the local path that the file will be downloaded to? Why is it only available for the "http" type? Seems like path should be a top level element.

Copy link
Member Author

@otaviof otaviof Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've amended to explain .path is a inner attribute, please consider latest changes.

- `git`: settings for `git`, namely `revision` (optional);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we eliminate revision and just use standard git paths that reference a revision in the url?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm rewording this to mention it stays as original.


The resource `UID`/`GID` will be defined by the same user running the Tekton task, later on, we
can extend the API to support arbitrary configuration.

### Single vs. Multiple Git Repositories

For the initial implementation, we will only support one git repository that is specified in `.spec.source`.
Remote artifacts will be specified n `.spec.sources` only.

We will address supporting multipe Git repositories in a future enhancement proposal. This will involve re-defining `/workspace/source` location and the `$workspace` placeholder.

### Standalone CRD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be placed in an "Alternatives" section at the bottom of the proposal. I believe we decided that we are not going to pursue this approach.

Leaving this in the main body confuses what is in the proposal vs. out. See the proposal template for guidance:

https://github.com/shipwright-io/build/blob/master/docs/proposals/guidelines/proposal-template.md

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the template, the standalone CRD alternative design was moved into a section at the end.


Alternatively, we may define the artifacts as a standalone CRD, that is a `BuildSource`
resource. The advantage of this design is being able to exchange, and reuse, artifacts on several
qu1queee marked this conversation as resolved.
Show resolved Hide resolved
builds. For instance, if two projects are sharing a common logo image, both `Builds` will refer to
the same `BuildSource`.

The following snippet shows how an Artifact (`BuildSource`) will be represented.


```yml
---
apiVersion: build.dev/v1alpha1
kind: BuildSource
metadata:
name: license-file
spec:
sources:
- name: license.tar
type: http
url: https://licenses.company.com/customer-id/license.tar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use a more realistic example like a jar artifact in all examples, please?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as in here.

http:
path: $(workspace)/licenses/license.tar
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question around the BuildSource. Within the Build's spec.sources, it is used to from one source entry to reference an external definition. Based on that, may expectation would be that the external build source would only contain a single spec like this:

---
apiVersion: build.dev/v1alpha1
kind: BuildSource
metadata:
  name: license-file
spec:
  type: http
  url: https://licenses.company.com/customer-id/license.tar
  http:
    path: $(workspace)/licenses/license.tar

Should we have multiple entries with their own name in the BuildSource, then we'll have to define how this is merged with the other entries in the Build's sources.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we may benefit from a slice of entries. Given that we may be downloading a number of dependencies from a single external source, and we can bundle together in this slice fashion. It also keeps aligned with Build resource where it will be available as a array/slice as well.


## Usage

The usage of this feature is based on declaring a slice of `.spec.sources` and later on overwriting
entries using `BuildRun` resource. For instance:

```yml
---
apiVersion: build.dev/v1alpha1
kind: Build
metadata:
name: example
spec:
source:
# ...
sources:
- name: license-file
adambkaplan marked this conversation as resolved.
Show resolved Hide resolved
```

Could have its `license-file` overwritten in a `BuildRun` with:

```yml
---
apiVersion: build.dev/v1alpha1
kind: BuildRun
metadata:
name: license-file
spec:
buildRef:
name: example
sources:
- name: license-file
sourceRef: alternative-file
```

## Steps and Helper

Artifacts are requirements for the build process, thus downloading and preparing artifacts must
happen before the build process starts. To achieve the objective, the operator will generate a new
task step to download the artifacts.

The download may happen using existing open-source software, more specifically `wget`. The
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "download" looks critical. We get this for free from tekton, Wouldn´t it make more sense to do this natively with "golang" or to see if Tekton have already something for us? e.g. they do not use a "git" binary for pulling, they have their own git pkg

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what Tekton does for the VCS (source-code), but what would it do for remote artifacts? Could we use Tekton to download them as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally there would be a Tekton resource handling things like plain HTTP(S) download. But, I think there is not one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not appear such a resource type exists. If we accept this proposal, I recommend submitting a TEP to see if we can get upstream Tekton buy-in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mainly referring to do a native implementation in go, I would like to avoid introducing extra binaries dependencies in the code. We might not need Tekton at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we all agree this is not enforcing the usage of wget and that another more native implementation(e.g. tekton way) might be reused. Can we mention this somewhere?

Copy link
Contributor

@zhangtbj zhangtbj Oct 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leverage the Tekton source to enable storage or something first?:
https://github.com/tektoncd/pipeline/blob/master/docs/resources.md#resource-types

I think wget is a little weak. How about the private resource (with auth) or ftp?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PipelineResources at this time are most likely not going to get out of alpha stage. There has been a fair amount of consternation and churn around how exactly how to move forward, though.

But minimally I would not advise a PipelineResource styled path.

That said, if I had to pick, I'd say "custom tasks" might be the most likely path along the "tekton resource" angle. Also, submitting something "similar" but different most likely would get the response of "look at custom tasks".

See https://github.com/tektoncd/community/blob/master/teps/0002-custom-tasks.md for details.

Copy link
Member

@gabemontero gabemontero Oct 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is also a "specializing task" which has some literature around it but no TEP as of yet

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, a specific resource in upstream Tekton would be a hard sell at the moment given the direction PipelineResources is taking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to see us use pure golang for the downloading instead of an external tool such as wget, i don't like having to rely on os provided tools.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like idea from Corey. We will eventually face a Build which has a longer list of sources. For all these sources beside the git source (which we will probably want to continue to handle using Tekton's Git resource), I envision one Task step implemented by ourselves using Golang. Within our code we can then do optimizations like parallel downloads for different sources and also can handle and report errors in a better way than by invoking an external command.

Copy link
Member Author

@otaviof otaviof Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like the idea of creating or own tool for that purpose. However, I see wget as a simple starting point, on which we could potentially develop our own tool in parallel.

So, how about get started with wget and evaluate the new tooling on its own track?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I raised the same concern on some comments ago, I do not see why we need to rely on a binary. #419 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can agree that we can implement the downloader, regardless of whether we build our own in go vs. "buy" it (by using a fixed container image with wget). IMO deciding which approach to take is beyond what is necessary to mark this EP as "implementable".

container image which contains this software will be specified via environment variable, and when
not informed will use a default value.

Therefore, the `buildrun` controller needs to generate a `TaskRun` step, having a `wget` commands
plus arguments, to download the specified artifacts.

The step generated by the operator will look like the following:

```yml
---
apiVersion: tekton.dev/v1beta1
kind: Task
otaviof marked this conversation as resolved.
Show resolved Hide resolved
metadata:
name: artifact-download
spec:
steps:
- name: artifact-download
image: busybox:latest
command:
- /bin/bash
args:
- -c
- >
wget --output="/workspace/source/classpath/main.jar" https://nexus.company.com/app/main.jar && \
chown 1000:1001 /workspace/source/classpath/main.jar && \
chmod 0644 /workspace/source/classpath/main.jar
```

## Example Use-Case

Using [Shipwright's proposed logos](https://github.com/shipwright-io/build/issues/325) as example,
let's assume we are building a [TypeScript application](https://github.com/otaviof/typescript-ex)
which will use the project logo, and we would like to create two different builds, one with the
default project logo, and another with the alternative.

By using remote artifacts, we can keep the separation of project source code and assets, and we
can describe those resources as:

```yml
---
apiVersion: build.dev/v1alpha1
kind: BuildSource
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This references the original CRD style of declaring a downloaded resource, which IIRC we are not pursuing. Please update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to the new "Alternatives" section, please consider.

metadata:
name: ship-logo
spec:
sources:
- name: project-logo
type: http
url: https://user-images.githubusercontent.com/2587818/92114986-69bfb600-edfa-11ea-820e-96cdb1014f58.png
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use a more realistic example like a jar artifact in all examples, please?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as in here.

http:
path: $(workspace)/assets/images/shipwright-logo.png
```

And, the alternative logo:

```yml
---
apiVersion: build.dev/v1alpha1
kind: BuildSource
metadata:
name: axes-logo
spec:
sources:
- name: project-logo
type: http
url: https://user-images.githubusercontent.com/2587818/92100668-c1ebbd80-ede4-11ea-9e8a-7379c3875ea0.png
http:
path: $(workspace)/assets/images/shipwright-logo.png
```

Then, we can create the `Build` resource, as per:

```yml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@otaviof pls correct me if I´m wrong, but here the example is based on the assumption that we support a new BuildSource CRD. If we do not have the BuildSource CRD in place, then we will still need to have

  • A Build instance to define the git source code
  • Two BuildRuns, each one overriding the spec.sources for each logo type

is the above correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct @qu1queee. The example was made to illustrate both scenarios at once, but it might have been too much leaning towards the "Alternative" scenario.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks

---
apiVersion: build.dev/v1alpha1
kind: Build
metadata:
name: typescript-ex
spec:
strategy:
name: buildpacks-v3
kind: ClusterBuildStrategy
source:
url: https://github.com/example/project.git
sources:
- name: source
type: git
url: https://github.com/otaviof/typescript-ex.git
- name: project-logo
sourceRef: ship-logo
output:
image: quay.io/otaviof/typescript-ex:latest
```

Now, we can create two `BuildRun` resources. The first only runs the build with original settings:


```yml
---
apiVersion: build.dev/v1alpha1
kind: BuildRun
metadata:
name: typescript-ex
spec:
buildRef:
name: typescript-ex
```

And later, we can create yet another `BuildRun`, but this time use the alternative logo. Here we are
overwriting the `project-logo` source name, with an alternative resource, i.e.:

```yml
---
apiVersion: build.dev/v1alpha1
kind: BuildRun
metadata:
name: typescript-ex-alternative-logo
spec:
buildRef:
name: typescript-ex
output:
image: quay.io/otaviof/typescript-ex:alternative
sources:
- name: project-logo
sourceRef: axes-logo
```

When the build processes are done, the following images will be available:
* `quay.io/otaviof/typescript-ex:latest`
* `quay.io/otaviof/typescript-ex:alternative`

A number of real world use-cases can be derived from this example, the `BuildSource` is the
foundation.

## Test Plan

1. Deploy the Shipwright Build operator in a cluster;
2. Create a `BuildSource` resource instance, point to a remote binary;
3. Create `Build` and `BuildRun` resources, using `BuildSource`;
4. Make sure the build process happens successfully, being able to use remote artifact;