Investigate different ways to pass parameters between tasks #41

Tomcli · 2020-03-19T22:10:07Z

Currently, we are using the task.results for passing parameter outputs. This has a limitation where the output parameter files have to be under /tekton/results. However, some of the Kubeflow pipeline has no configuration for the output file path (e.g. The current Watson ML example). Therefore, we need to figure out an alternative way that can take output parameter files from any path in the container.

The text was updated successfully, but these errors were encountered:

issue-label-bot · 2020-03-19T22:10:14Z

Issue-Label Bot is automatically applying the labels:

Label	Probability
feature	0.93

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

Tomcli · 2020-03-19T22:11:58Z

/p1

ckadner · 2020-03-19T23:24:37Z

FYI, linking to PR #27 which introduced the limited file_outputs parameter passing

PR #27 made this scenario work, where the path of the output (#1) file shows up as string in the arguments (#2) of the ContainerOp:

def gcs_download_op(url):
  return dsl.ContainerOp(
    name='GCS - Download',
    image='google/cloud-sdk:279.0.0',
    command=['sh', '-c'],
    arguments=['gsutil cat $0 | tee $1', url, '/tmp/results.txt'], # 2: output file path
    file_outputs={
      'data': '/tmp/results.txt',  # 1: output parameter mapped to file path
    }
  )

which the KFP-Tekton compiler replaces with the expression $(results.data.path):

spec:
  params:
  - name: url1
  results:
  - name: data              # 1: result parameter definition
    description: /tmp/results.txt
  steps:
  - name: gcs-download
    image: google/cloud-sdk:279.0.0
    command: ['sh', '-c']
    args:
    - gsutil cat $0 | tee $1
    - $(inputs.params.url1)
    - $(results.data.path)  # 2: replaced file path with result parameter expression

However, if the output file location is dynamically provided as a input parameter or is simply known by the developer and does not show up in clear text, then the above logic fails:

def TrainOp(name, input_dir, output_dir, model_name, model_version, epochs):
  return dsl.ContainerOp(
    name=name,
    image='<train-image>',
    arguments=[      # 2: there are no output file path to replace with Tekton variable expression
      '--input_dir', input_dir,
      '--output_dir', output_dir,
      '--model_name', model_name,
      '--model_version', model_version,
      '--epochs', epochs
    ],
    file_outputs={'output': '/output.txt'}  # 1: output parameter mapped to file path
  )

Tomcli · 2020-03-20T17:58:06Z

initial proposal, adding an extra step to copy file to /tekton/results

Tomcli · 2020-03-24T00:55:24Z

update: passing file between steps will still require either volumemount or workspaces.
However, both of them still requires to specify a mountpath that's not /. Therefore, it won't be scalable if users store their file in more than one subpath.

Tomcli · 2020-03-24T18:23:38Z

init poc branch: https://github.com/Tomcli/kfp-tekton/tree/copy_param

Tomcli · 2020-03-24T19:12:14Z

right now we don't have code-gen for pipelinerun. So creating workspaces will require users to manually define their workspace volumes. Should we propose to also generate pipelinerun as part of our yaml?

ckadner · 2020-03-24T23:19:00Z

@afrittoli -- can you clarify if we really need a Workspace (and a PipelineRun to mount the volume) here? We are just trying to add an additional step to those tasks which have output parameters which need to be moved to the default /tekton/results folder

ckadner · 2020-03-27T21:53:12Z

/kind discussion
/remove-kind feature

Tomcli · 2020-03-27T23:13:41Z

Added initial approach to copy the files with an extra step. @afrittoli let me know if you could think of a better way to pass parameters in any path.

afrittoli · 2020-03-28T09:21:20Z

update: passing file between steps will still require either volumemount or workspaces.
However, both of them still requires to specify a mountpath that's not /. Therefore, it won't be scalable if users store their file in more than one subpath.

For sharing between steps you can mount an emptyDir, however that would not work for sharing across tasks. You could use an empty dir that mounts to /kfp and transform user provided paths to be relative to that i.e. /tmp/output.txt would be /kfp/tmp/output.txt in the Tekton YAML.

afrittoli · 2020-03-28T09:24:58Z

@afrittoli -- can you clarify if we really need a Workspace (and a PipelineRun to mount the volume) here? We are just trying to add an additional step to those tasks which have output parameters which need to be moved to the default /tekton/results folder

As long as you're within the boundaries of a Task, you don't need a Workspace.
Workspaces are an useful abstraction that allows you to avoid embeeding the name of a PVC into the static definition of a Task/Pipeline - so that you can bind any PVC to the workspace at runtime. You don't need workspaces for file sharing between steps, you can use the node filesystem for that, mount an emtpy dir to any path you like (except /tekton, which is reserved).

afrittoli · 2020-03-28T09:30:12Z

As a side not, I feel that allowing absolute paths into the KFP DSL is a leak of a lower level detail into the DSL abstraction, but I guess it might be too late to change that?
As a data scientist / machine learning engineer, I would like to create Pipeline operations with named outputs - how this outputs are stored into file within the containers is not something I should worry about. The only file interface should be when I want to export data to a permanent storage so that it's available outside of the pipeline.

issue-label-bot bot added the feature label Mar 19, 2020

jlewi added kind/feature and removed feature labels Mar 20, 2020

Tomcli mentioned this issue Mar 24, 2020

add support for dsl.volumeOps #51

Closed

ckadner mentioned this issue Mar 24, 2020

Should the compiler generate Pipeline or PipelineRun #59

Closed

4 tasks

k8s-ci-robot added kind/discussion and removed kind/feature labels Mar 27, 2020

Tomcli mentioned this issue Mar 27, 2020

Use workspace to copy output files #72

Closed

Tomcli mentioned this issue Mar 30, 2020

Copy param output files using emptyDir volumes #75

Merged

k8s-ci-robot closed this as completed in #75 Apr 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate different ways to pass parameters between tasks #41

Investigate different ways to pass parameters between tasks #41

Tomcli commented Mar 19, 2020 •

edited

Loading

issue-label-bot bot commented Mar 19, 2020

Tomcli commented Mar 19, 2020

ckadner commented Mar 19, 2020 •

edited

Loading

Tomcli commented Mar 20, 2020

Tomcli commented Mar 24, 2020

Tomcli commented Mar 24, 2020

Tomcli commented Mar 24, 2020 •

edited

Loading

ckadner commented Mar 24, 2020

ckadner commented Mar 27, 2020

Tomcli commented Mar 27, 2020

afrittoli commented Mar 28, 2020

afrittoli commented Mar 28, 2020

afrittoli commented Mar 28, 2020

Investigate different ways to pass parameters between tasks #41

Investigate different ways to pass parameters between tasks #41

Comments

Tomcli commented Mar 19, 2020 • edited Loading

issue-label-bot bot commented Mar 19, 2020

Tomcli commented Mar 19, 2020

ckadner commented Mar 19, 2020 • edited Loading

Tomcli commented Mar 20, 2020

Tomcli commented Mar 24, 2020

Tomcli commented Mar 24, 2020

Tomcli commented Mar 24, 2020 • edited Loading

ckadner commented Mar 24, 2020

ckadner commented Mar 27, 2020

Tomcli commented Mar 27, 2020

afrittoli commented Mar 28, 2020

afrittoli commented Mar 28, 2020

afrittoli commented Mar 28, 2020

Tomcli commented Mar 19, 2020 •

edited

Loading

ckadner commented Mar 19, 2020 •

edited

Loading

Tomcli commented Mar 24, 2020 •

edited

Loading