Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support artifact outputs #92

Merged
merged 13 commits into from
Apr 21, 2020
2 changes: 1 addition & 1 deletion sdk/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ Prerequisite: Install [Kubeflow Pipeline](https://www.kubeflow.org/docs/pipeline
By default, artifacts are disabled because it's depended on Kubeflow Pipeline's minio setup. When artifacts are enabled, all the output parameters are also treated as artifacts and persist to the default object storage. Enabling artifacts also allow files to be downloaded or stored as artifact inputs/outputs. Since artifacts are depending on the Kubeflow Pipeline's setup by default, the generated Tekton pipeline must be deployed to the same namespace as Kubeflow Pipeline.

To compile Kubeflow Pipelines as Tekton pipelineRun, simply add the `--enable-artifacts` as part of your `dsl-compile-tekton` commands. e.g.
- `dsl-compile-tekton --py sdk/python/tests/compiler/testdata/artifact_location. --output pipeline.yaml --enable-artifacts`
- `dsl-compile-tekton --py sdk/python/tests/compiler/testdata/artifact_location.py --output pipeline.yaml --enable-artifacts`

Tomcli marked this conversation as resolved.
Show resolved Hide resolved
## Troubleshooting
- Please be aware that defined Affinity, Node Selector, and Tolerations are applied to all the tasks in the same pipeline because there's only one podTemplate allowed in each pipeline.
Expand Down
4 changes: 2 additions & 2 deletions sdk/python/tests/compiler/compiler_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,8 +171,8 @@ def test_artifact_location_workflow(self):
"""
Test compiling a artifact location workflow.
"""
from .testdata.artifact_location import foo_pipeline
self._test_pipeline_workflow(foo_pipeline, 'artifact_location.yaml', enable_artifacts=True)
from .testdata.artifact_location import custom_artifact_location
self._test_pipeline_workflow(custom_artifact_location, 'artifact_location.yaml', enable_artifacts=True)

def test_katib_workflow(self):
"""
Expand Down
48 changes: 24 additions & 24 deletions sdk/python/tests/compiler/testdata/artifact_location.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,41 +12,41 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import kfp
from kfp import dsl
from kubernetes.client.models import V1SecretKeySelector
from kubernetes.client import V1SecretKeySelector


@dsl.pipeline(name='artifact-location-pipeine', description='hello world')
def foo_pipeline(tag: str, namespace: str = "kubeflow", bucket: str = "foobar"):
@dsl.pipeline(
name="custom_artifact_location_pipeline",
description="""A pipeline to demonstrate how to configure the artifact
location for all the ops in the pipeline.""",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can add another comment here that this assumes the pipeline will be deployed/run in the kubeflow namespace?

)
def custom_artifact_location(
secret_name: str = "mlpipeline-minio-artifact",
tag: str = '1.31.0',
namespace: str = "kubeflow",
bucket: str = "mlpipeline"
):

# configures artifact location
pipeline_artifact_location = dsl.ArtifactLocation.s3(
bucket=bucket,
endpoint="minio-service.%s:9000" % namespace,
insecure=True,
access_key_secret={"name": "minio", "key": "accesskey"},
secret_key_secret=V1SecretKeySelector(name="minio", key="secretkey"))

# configures artifact location using AWS IAM role (no access key provided)
aws_artifact_location = dsl.ArtifactLocation.s3(
bucket=bucket,
endpoint="s3.amazonaws.com",
region="ap-southeast-1",
insecure=False)
bucket=bucket,
endpoint="minio-service.%s:9000" % namespace, # parameterize minio-service endpoint
insecure=True,
access_key_secret=V1SecretKeySelector(name=secret_name, key="accesskey"),
secret_key_secret={"name": secret_name, "key": "secretkey"}, # accepts dict also
)

# set pipeline level artifact location
dsl.get_pipeline_conf().set_artifact_location(pipeline_artifact_location)

# pipeline level artifact location (to minio)
op1 = dsl.ContainerOp(
name='foo',
image='busybox:%s' % tag,
output_artifact_paths={
'out_art': '/tmp/out_art.txt',
},
)
# artifacts in this op are stored to endpoint `minio-service.<namespace>:9000`
op = dsl.ContainerOp(name="foo", image="busybox:%s" % tag,
Tomcli marked this conversation as resolved.
Show resolved Hide resolved
command=['sh', '-c', 'echo hello > /tmp/output.txt'],
file_outputs={'output': '/tmp/output.txt'})

if __name__ == '__main__':
# don't use top-level import of TektonCompiler to prevent monkey-patching KFP compiler when using KFP's dsl-compile
from kfp_tekton.compiler import TektonCompiler
TektonCompiler().compile(foo_pipeline, __file__.replace('.py', '.yaml'), enable_artifacts=True)
TektonCompiler().compile(custom_artifact_location, __file__.replace('.py', '.yaml'), enable_artifacts=True)
45 changes: 27 additions & 18 deletions sdk/python/tests/compiler/testdata/artifact_location.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,17 @@ spec:
params:
- name: bucket
- name: namespace
- name: secret_name
- name: tag
stepTemplate:
volumeMounts:
- mountPath: /tmp
name: out_art
results:
- description: /tmp/output.txt
name: output
steps:
- image: busybox:$(inputs.params.tag)
- command:
- sh
- -c
- echo hello > $(results.output.path)
image: busybox:$(inputs.params.tag)
name: foo
- env:
- name: PIPELINERUN
Expand All @@ -45,37 +49,40 @@ spec:
valueFrom:
secretKeyRef:
key: accesskey
name: minio
name: $(inputs.params.secret_name)
Copy link
Member

@ckadner ckadner Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Tommy,

this seems to not work out-of-the box. Pipeline does not show any progress, there are no logs and nothing stored in Minio.

$ tkn pipeline start custom-artifact-location-pipeline --showlog:

? Value for param `secret_name` of type `string`? (Default is `mlpipeline-minio-artifact`) mlpipeline-minio-artifact
? Value for param `tag` of type `string`? (Default is `1.31.0`) 1.31.0
? Value for param `namespace` of type `string`? (Default is `kubeflow`) kubeflow
? Value for param `bucket` of type `string`? (Default is `mlpipeline`) mlpipeline
Pipelinerun started: custom-artifact-location-pipeline-run-6mcxt
Waiting for logs to be available...
^C

... and later ...
$ tkn pipeline logs custom-artifact-location-pipeline --last:

Pipeline still running ...
No logs found

But when I change it back to

    - name: AWS_ACCESS_KEY_ID
      value: minio
    - name: AWS_SECRET_ACCESS_KEY
      value: minio123

Then it works and I get this output:

? Value for param `secret_name` of type `string`? (Default is `mlpipeline-minio-artifact`) mlpipeline-minio-artifact
? Value for param `tag` of type `string`? (Default is `1.31.0`) 1.31.0
? Value for param `namespace` of type `string`? (Default is `kubeflow`) kubeflow
? Value for param `bucket` of type `string`? (Default is `mlpipeline`) mlpipeline
Pipelinerun started: custom-artifact-location-pipeline-run-d2x6l
Waiting for logs to be available...

[foo : copy-artifacts] Added `storage` successfully.
[foo : copy-artifacts] `/tekton/results/output` -> `storage/mlpipeline/runs/custom-artifact-location-pipeline-run-d2x6l/custom-artifact-location-pipeline-run-d2x6l-foo-2v5gf-pod-86ll6/output.txt`
[foo : copy-artifacts] Total: 0 B, Transferred: 6 B, Speed: 565 B/s

...along with the file in showing up in Minio.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this pipeline, you have to run under the same namespace as kfp, so by default you should run it under the kubeflow namespace. If you want to run on different namespace, then you have to create a secret with the s3 key and secret using the same secret_name in the parameter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking from a general integration test suite, it would be great if we can run this (and all other) pipeline as is. In order to use Tekton I am using a different default namespace. Can we just use the minio and minio123 credentials since they are not really secret?

Copy link
Member Author

@Tomcli Tomcli Apr 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DSL is designed to take the credentials from secret, even the default setting is reading from a secret. Is it possible for you to create/copy the mlpipeline-minio-artifact from kubeflow to your default namespace?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For automation this would not be practical. Maybe we can run all pipelines in the kubeflow namespace and clean up afterwards.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of CI tests in kubeflow assume the kubernetes namespace is setup with all the necessary dependencies such as gcp secret, image pull secret, service account, etc. Probably we should create some sort of CI setup script for Tekton as well. Many of the design decision I made is to align with how Argo behaves, so kfp can reuse the same set up for Tekton.

- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: secretkey
name: minio
name: $(inputs.params.secret_name)
image: minio/mc
name: copy-artifacts
script: |-
#!/usr/bin/env sh
mc config host add storage http://minio-service.$(inputs.params.namespace):9000 $AWS_ACCESS_KEY_ID $AWS_SECRET_ACCESS_KEY
mc cp /tmp/out_art.txt storage/$(inputs.params.bucket)/runs/$PIPELINERUN/$PODNAME/out_art.txt
volumes:
- emptyDir: {}
name: out_art
mc cp $(results.output.path) storage/$(inputs.params.bucket)/runs/$PIPELINERUN/$PODNAME/output.txt
---
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
annotations:
pipelines.kubeflow.org/pipeline_spec: '{"description": "hello world", "inputs":
[{"name": "tag", "type": "String"}, {"default": "kubeflow", "name": "namespace",
"optional": true, "type": "String"}, {"default": "foobar", "name": "bucket",
"optional": true, "type": "String"}], "name": "artifact-location-pipeine"}'
name: artifact-location-pipeine
pipelines.kubeflow.org/pipeline_spec: '{"description": "A pipeline to demonstrate
how to configure the artifact\n location for all the ops in the pipeline.",
"inputs": [{"default": "mlpipeline-minio-artifact", "name": "secret_name", "optional":
true, "type": "String"}, {"default": "1.31.0", "name": "tag", "optional": true,
"type": "String"}, {"default": "kubeflow", "name": "namespace", "optional":
true, "type": "String"}, {"default": "mlpipeline", "name": "bucket", "optional":
true, "type": "String"}], "name": "custom_artifact_location_pipeline"}'
name: custom-artifact-location-pipeline
spec:
params:
- name: tag
- default: mlpipeline-minio-artifact
name: secret_name
- default: 1.31.0
name: tag
- default: kubeflow
name: namespace
- default: foobar
- default: mlpipeline
name: bucket
tasks:
- name: foo
Expand All @@ -84,6 +91,8 @@ spec:
value: $(params.bucket)
- name: namespace
value: $(params.namespace)
- name: secret_name
value: $(params.secret_name)
- name: tag
value: $(params.tag)
taskRef:
Expand Down