Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EP] Remote Artifacts #419

Merged
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
267 changes: 267 additions & 0 deletions docs/proposals/remote-artifacts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,267 @@
<!--
Copyright The Shipwright Contributors

SPDX-License-Identifier: Apache-2.0
-->

---
title: remote-artifacts
authors:
- "@otaviof"
reviewers:
- TBD
approvers:
- TBD
creation-date: 2020-10-02
last-updated: 2020-10-02
status: provisional
---

Shipwright Remote Artifacts
otaviof marked this conversation as resolved.
Show resolved Hide resolved
---------------------------

# Summary

Remote artifacts are often dependencies of the software building process, and hence are also
required when dealing with container images. This enhancement proposal focuses on adding a build
otaviof marked this conversation as resolved.
Show resolved Hide resolved
artifacts abstraction to Shipwright's Operator.
otaviof marked this conversation as resolved.
Show resolved Hide resolved

# Motivation
qu1queee marked this conversation as resolved.
Show resolved Hide resolved

Give Shipwright's operator broader build use-case support by enhancing its capabilities to include
the concept of Artifacts. In other words, remote entities that will be available for the image
build process.

End users will be allowed to declare artifacts that represent remote dependencies alongside
Builds, and easily link them together. For example, a Java application which downloads certain
jars into the classpath, a Node.js application adding external images, and so forth.

## Goals

* Provide means to declare remote artifacts, dependencies that can be employed on builds;
* Create the mechanism to download and prepare remote artifacts for builds;

## Non-Goals

* Automate the upload of local artifacts into the cluster;
* Manage remote artifacts;
* Amend container images created by the given `BuildStrategy`;
* Walking a remote directory tree, similarly to a `git checkout`;

# Proposal

The enhancement proposal is centered around the idea of declaring external dependencies, here
called "artifacts", and being able to use them during the container image building process.

## Build Artifacts

The remote artifacts will be directly expressed in the body of a `Build` resource, as the
following example:

```yml
---
apiVersion: build.dev/v1alpha1
otaviof marked this conversation as resolved.
Show resolved Hide resolved
kind: Build
metadata:
name: nexus-jar
spec:
artifacts:
- name: main-jar
url: https://nexus.company.com/app/main.jar
path: $(workspace)/classpath/main.jar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe consider extending the source section instead ?

apiVersion: build.dev/v1alpha1
kind: Build
metadata:
  name: nexus-jar
spec:
  sources
    # A source of type=git must only exist once or be absent
    - type: git
      url: [email protected]:SCHWARZS/test-project.git
      revision: sascha-test
      credentials:
        name: ibm-github
    # A source of type=http can exist multiple times
    - type: http
      url: https://nexus.company.com/app/main.jar
      path: classpath/main.jar # Not sure if I would allow values that do not start with ${workspace} here but instead always have that as prefix
    # Credentials support for whatever we want, start with basic auth only, client-cert maybe in the future, we will need to determine it based on how the secret looks
    - type: http
      url: https://restricted.server.com/lib.jar
      credentials:
        name: restricted-server
      path: classpath/lib.jar
    # A source of type=http can have an optional extract flag set to true which means that path is interpreted as a directory instead of a file and the file content is extracted (TBD: can we autodetect the file type or do we need to make extract an enum or add another compressiontype parameter?)
    - type: http
      url: https://www.myserver.com/sources.zip
      extract: true
      path: sources

This would be an extensible mechanism. In the future we might support other types (loading from maybe some configmap where the cli placed the local sources).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can introduce glue-code to autodetect the type based on the url, keeping the whole spec.source as simple as it is now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be hard. An HTTPS URL can be a public Git repository (especially as we do not require the .git suffix at the moment), a SubVersion repository, a plain file and probably more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think given the distinction for types of data, one being VCS and the other remote artifacts, I think they should be described in different logical blocks. So, we also have a clear separation between those two concerts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@otaviof a VCS resource is also remote. There is "just" a different protocol to download them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be an extensible mechanism. In the future we might support other types (loading from maybe some configmap where the cli placed the local sources).

ConfigMap is certainly a valid use case, though with a cli upload I see PersistentVolume/PVC being a more likely result.

Another thing we should clarify is that "sources" are things that are intended to be present in the resulting image. There is a separate use case for "volume" support (like caches), in which case the data present in the volume is not included in the resulting container image.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...
spec:
  sources
    # A source of type=git must only exist once or be absent
    - type: git
      url: [email protected]:SCHWARZS/test-project.git
      revision: sascha-test
      credentials:
        name: ibm-github

Wouldn't we want to employ the discriminator pattern?

spec:
  sources:
  - name: git
    type: Git
    git:
      url: ...
...

(note that type: Git may not be required, apimachinery can fill it in).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same line of thought, I see a VCS as yet another remote artifact. I´m not able to understand the distinction.

Copy link
Member Author

@otaviof otaviof Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider latest commit, I'm adding a initial .spec.sources support with your inputs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@otaviof thanks, I took a look.

http: settings for http, namely path (optional);

Is the new field from above required? we already have something call contextDir, can this serve the same purpose?

@adambkaplan @SaschaSchwarze0 @otaviof I have some concerns around spec.sources. First of all I think introducing a type is a nice way to distinguish the different assets(remote files, git,etc). I´m struggling to understand the value of moving from of a map(spec.source) into a list(spec.sources). My rational is:

  • This proposal does not explains the use cases on why a list is needed
  • This will break the current API
  • spec.source is for me one of the structs that is very simple to understand, and I would like to keep it this way. Adding support for a list of sources is going to add complexity on that API level.
  • supporting spec.sources will introduce some code complexity when generating the Taskrun. It would be good to highlight those and decide if it worth the effort.

@sbose78 probably we want your feedback also. Also, do we have some good references on how Build v1 did it(shouldnt bias us)?, but it will help to understand the rational there.

```

A new attribute, .spec.artifacts will be added, containing a slice of types with:

* **Name**: The actual name of the artifact, optional;
otaviof marked this conversation as resolved.
Show resolved Hide resolved
* **URL**: Universal location of the artifact;
* **Path**: Represents the final artifact location with placeholder (`$(workspace)`) support;

The resource `UID`/`GID` will be defined by the same user running the Tekton task, later on, we
can extend the API to support arbitrary configuration.

### Standalone CRD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be placed in an "Alternatives" section at the bottom of the proposal. I believe we decided that we are not going to pursue this approach.

Leaving this in the main body confuses what is in the proposal vs. out. See the proposal template for guidance:

https://github.com/shipwright-io/build/blob/master/docs/proposals/guidelines/proposal-template.md

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the template, the standalone CRD alternative design was moved into a section at the end.


Alternatively, we may define the artifacts as a standalone CRD, that is a `BuildArtifact`
resource. The advantage of this design is being able to exchange, and reuse, artifacts on several
qu1queee marked this conversation as resolved.
Show resolved Hide resolved
builds. For instance, if two projects are sharing a common logo image, both `Builds` will refer to
the same `BuildArtifact`.

The following snippet shows how an Artifact (`BuildArtifact`) will be represented. In this
example, consider a Java based application with Jar hosted in Nexus, that needs to be added to
application's `classpath` directory.


```yml
---
apiVersion: build.dev/v1alpha1
kind: BuildArtifact
metadata:
name: nexus-jars
spec:
artifacts:
- name: main-jar
url: https://nexus.company.com/app/main.jar
path: $(workspace)/classpath/main.jar
```

## Using Artifacts

Giving the artifacts declared, a developer will be able to include those artifacts in Build
resources as per the following example:


```yml
---
apiVersion: build.dev/v1alpha1
kind: Build
metadata:
name: java-application
spec:
artifacts:
- nexus-jars
```

By adding spec.artifacts, the operator will generate build steps to download external data, and
place it in the expected location before the build strategy.Furthermore, developers will be able
to overwrite artifacts on `BuildRun` level, allowing for more use-cases on which different sets of
artifacts is required.


## Steps and Helper

Artifacts are requirements for the build process, thus downloading and preparing artifacts must
happen before the build process starts. To achieve the objective, the operator will generate a new
task step to download the artifacts.

The download may happen using existing open-source software, more specifically `wget`. The
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "download" looks critical. We get this for free from tekton, Wouldn´t it make more sense to do this natively with "golang" or to see if Tekton have already something for us? e.g. they do not use a "git" binary for pulling, they have their own git pkg

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what Tekton does for the VCS (source-code), but what would it do for remote artifacts? Could we use Tekton to download them as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally there would be a Tekton resource handling things like plain HTTP(S) download. But, I think there is not one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not appear such a resource type exists. If we accept this proposal, I recommend submitting a TEP to see if we can get upstream Tekton buy-in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mainly referring to do a native implementation in go, I would like to avoid introducing extra binaries dependencies in the code. We might not need Tekton at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we all agree this is not enforcing the usage of wget and that another more native implementation(e.g. tekton way) might be reused. Can we mention this somewhere?

Copy link
Contributor

@zhangtbj zhangtbj Oct 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leverage the Tekton source to enable storage or something first?:
https://github.com/tektoncd/pipeline/blob/master/docs/resources.md#resource-types

I think wget is a little weak. How about the private resource (with auth) or ftp?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PipelineResources at this time are most likely not going to get out of alpha stage. There has been a fair amount of consternation and churn around how exactly how to move forward, though.

But minimally I would not advise a PipelineResource styled path.

That said, if I had to pick, I'd say "custom tasks" might be the most likely path along the "tekton resource" angle. Also, submitting something "similar" but different most likely would get the response of "look at custom tasks".

See https://github.com/tektoncd/community/blob/master/teps/0002-custom-tasks.md for details.

Copy link
Member

@gabemontero gabemontero Oct 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is also a "specializing task" which has some literature around it but no TEP as of yet

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, a specific resource in upstream Tekton would be a hard sell at the moment given the direction PipelineResources is taking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to see us use pure golang for the downloading instead of an external tool such as wget, i don't like having to rely on os provided tools.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like idea from Corey. We will eventually face a Build which has a longer list of sources. For all these sources beside the git source (which we will probably want to continue to handle using Tekton's Git resource), I envision one Task step implemented by ourselves using Golang. Within our code we can then do optimizations like parallel downloads for different sources and also can handle and report errors in a better way than by invoking an external command.

Copy link
Member Author

@otaviof otaviof Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like the idea of creating or own tool for that purpose. However, I see wget as a simple starting point, on which we could potentially develop our own tool in parallel.

So, how about get started with wget and evaluate the new tooling on its own track?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I raised the same concern on some comments ago, I do not see why we need to rely on a binary. #419 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can agree that we can implement the downloader, regardless of whether we build our own in go vs. "buy" it (by using a fixed container image with wget). IMO deciding which approach to take is beyond what is necessary to mark this EP as "implementable".

container image which contains this software will be specified via environment variable, and when
not informed will use a default value.

Therefore, the operator needs to generate `wget` commands with arguments in order to download the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the operator needs to generate wget commands with arguments in order to download the

The buildrun controller needs to generate a TaskRun step that would generate the wget command to download the artifacts specified

Copy link
Member Author

@otaviof otaviof Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's a good addition. Please consider latest commit 01ace39.

artifacts specified.

The step generated by the operator will look like the following:

```yml
---
apiVersion: tekton.dev/v1beta1
kind: Task
otaviof marked this conversation as resolved.
Show resolved Hide resolved
metadata:
name: artifact-download
spec:
steps:
- name: artifact-download
image: busybox:latest
command:
- /bin/bash
args:
- -c
- >
wget --output="/workspace/source/classpath/main.jar" https://nexus.company.com/app/main.jar && \
chown 1000:1001 /workspace/source/classpath/main.jar && \
chmod 0644 /workspace/source/classpath/main.jar
```

## Example Use-Case

Using [Shipwright's proposed logos](https://github.com/shipwright-io/build/issues/325) as example,
let's assume we are building a [TypeScript application](https://github.com/otaviof/typescript-ex)
which will use the project logo, and we would like to create two different builds, one with the
default project logo, and another with the alternative.

By using remote artifacts, we can keep the separation of project source code and assets, and we
can describe those resources as:

```yml
---
apiVersion: build.dev/v1alpha1
kind: BuildArtifact
metadata:
name: ship-logo
spec:
artifacts:
- name: ship-logo
Url: https://user-images.githubusercontent.com/2587818/92114986-69bfb600-edfa-11ea-820e-96cdb1014f58.png
path: $(workspace)/assets/images/shipwright-logo.png
```

And, the alternative logo:

```yml
---
apiVersion: build.dev/v1alpha1
kind: BuildArtifact
metadata:
name: axes-logo
spec:
artifacts:
- name: axes-logo
url: https://user-images.githubusercontent.com/2587818/92100668-c1ebbd80-ede4-11ea-9e8a-7379c3875ea0.png
path: $(workspace)/assets/images/shipwright-logo.png
```

Then, we can create the `Build` resource, as per:

```yml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@otaviof pls correct me if I´m wrong, but here the example is based on the assumption that we support a new BuildSource CRD. If we do not have the BuildSource CRD in place, then we will still need to have

  • A Build instance to define the git source code
  • Two BuildRuns, each one overriding the spec.sources for each logo type

is the above correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct @qu1queee. The example was made to illustrate both scenarios at once, but it might have been too much leaning towards the "Alternative" scenario.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks

---
apiVersion: build.dev/v1alpha1
kind: Build
metadata:
name: typescript-ex
spec:
strategy:
name: buildpacks-v3
kind: ClusterBuildStrategy
source:
url: https://github.com/otaviof/typescript-ex.git
artifacts:
- ship-logo
output:
image: quay.io/otaviof/typescript-ex:latest
```

Now, we can create two `BuildRun` resources, overwriting `.spec.artifacts` to compose the alternative build. For instance:


```
---
apiVersion: build.dev/v1alpha1
kind: BuildRun
metadata:
name: typescript-ex
spec:
buildRef:
name: typescript-ex
```

Also:

```
---
apiVersion: build.dev/v1alpha1
kind: BuildRun
metadata:
name: typescript-ex-alternative-logo
spec:
buildRef:
name: typescript-ex
output:
image: quay.io/otaviof/typescript-ex:alternative
artifacts:
- axes-logo
otaviof marked this conversation as resolved.
Show resolved Hide resolved
```

When the build processes are done, the following images will be available:
* `quay.io/otaviof/typescript-ex:latest`
* `quay.io/otaviof/typescript-ex:alternative`

A number of real world use-cases can be derived from this example, the `BuildArtifacts` is the
foundation.

## Test Plan

1. Deploy the Shipwright Build operator in a cluster;
2. Create a `BuildArtifact` resource instance, point to a remote binary;
3. Create `Build` and `BuildRun` resources, using `BuildArtifact`;
4. Make sure the build process happens successfully, being able to use remote artifact;