Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for dsl.volumeOps #51

Closed
Tomcli opened this issue Mar 23, 2020 · 5 comments · Fixed by #93
Closed

add support for dsl.volumeOps #51

Tomcli opened this issue Mar 23, 2020 · 5 comments · Fixed by #93
Assignees

Comments

@Tomcli
Copy link
Member

Tomcli commented Mar 23, 2020

Documentation for volumeOps: https://github.com/kubeflow/pipelines/blob/master/samples/core/volume_ops/README.md

volumeOps in kfp currently do the following things:

  1. Create the PVC.
  2. Mount PVC for every component that uses the volumeOps

In the very low level, it uses the same resource function in Argo for creating the PVC resource. Therefore, we can reuse the resourceOps task for volumeOps for the PVC creation.

For the PVC mount, since we already know the name of the pvc and have volume support. We only need to make the resourceOps task to fail fast if something went wrong.

Alternatively, instead of using the pvc mount, we can use workspaces instead. However, we need to reimplement our own logic for workspaces and remove the existing volumeOps PVC mount logic. Furthermore, we will need to generate a pipelinerun yaml because the tkn cli generator is not yet supporting workspaces code-gen. The same workspaces discussion also in #41 for using it to copy files between steps.

The proposed auto workspaces could also work if it's implemented the way as it promised. However, it still going to take some time for someone to start a PR in Tekton and implement this feature.

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
feature 0.96

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@Tomcli
Copy link
Member Author

Tomcli commented Mar 23, 2020

/assign

@Tomcli
Copy link
Member Author

Tomcli commented Mar 24, 2020

@afrittoli

@ckadner
Copy link
Member

ckadner commented Mar 24, 2020

If this can be implemented without Workspace and a PipelineRun spec, I would prefer that until we have to bigger discussion in issue #59 resolved

@Tomcli
Copy link
Member Author

Tomcli commented Mar 25, 2020

If we are only going to use volume and volumemount, most of the logic is already implemented. Since the VolumeOp is an inheritance of ResourceOp, many of the logics are shared except VolumeOp has some extra mapping with the output params. This also applies to VolumeSnapshotOp where instead of creating a PVC, it creates a volume Snapshot on top of an existing PVC.

jlpettersson added a commit to jlpettersson/pipeline that referenced this issue Apr 4, 2020
An existing PersistentVolumeClaim can currently be used as a Workspace
volume source. There is two ways of using an existing PVC as volume:

 - Reuse an existing PVC
 - Create a new PVC before each PipelineRun.

There is disadvantages by reusing the same PVC for every PipelineRun:

 - You need to clean the PVC at the end of the Pipeline
 - All Tasks using the workspace will be scheduled to the node where
   the PV is bound
 - Concurrent PipelineRuns may interfere, an artifact or file from one
   PipelineRun may slip in to or be used in another PipelineRun, with
   very few audit tracks.

There is also disadvantages by creating a new PVC before each PipelineRun:

 - This can not (easily) be done declaratively
 - This is hard to do programmatically, because it is hard to know when
   to delete the PVC. The PipelineRun can not be set as OwnerReference since
   the PVC must be created first

 This commit adds 'volumeClaimTemplate' as a volume source for workspaces. This
 has several advantages:

 - The syntax is used in k8s StatefulSet and other k8s projects so it is
   familiar in the kubernetes ecosystem
 - It is possible to declaratively declare that a PVC should be created for each
   PipelineRun, e.g. from a TriggerTemplate.
 - The user can choose storageClass (or omit to get the cluster default) to e.g.
   get a faster SSD volume, or to get a volume compatible with e.g. Windows.
 - The user can adapt the size to the job, e.g. use 5Gi for apps that contains
   machine learning models, or 1Gi for microservice apps. It can be changed on
   demand in a configuration that lives in the users namespace e.g. in a
   TriggerTemplate.
 - The size affects the storage quota that is set on the namespace and it may affect
   billing and cost depending on the cluster environment.
 - The PipelineRun or TaskRun with the template is created first, and is used
   as OwnerReference on the PVC. That means that the PVC will have the same lifecycle
   as the PipelineRun.

 Related to tektoncd#1986

 See also:
  - tektoncd#2174
  - tektoncd#2218
  - tektoncd/triggers#476
  - tektoncd/triggers#482
  - kubeflow/kfp-tekton#51
jlpettersson added a commit to jlpettersson/pipeline that referenced this issue Apr 9, 2020
An existing PersistentVolumeClaim can currently be used as a Workspace
volume source. There is two ways of using an existing PVC as volume:

 - Reuse an existing PVC
 - Create a new PVC before each PipelineRun.

There is disadvantages by reusing the same PVC for every PipelineRun:

 - You need to clean the PVC at the end of the Pipeline
 - All Tasks using the workspace will be scheduled to the node where
   the PV is bound
 - Concurrent PipelineRuns may interfere, an artifact or file from one
   PipelineRun may slip in to or be used in another PipelineRun, with
   very few audit tracks.

There is also disadvantages by creating a new PVC before each PipelineRun:

 - This can not (easily) be done declaratively
 - This is hard to do programmatically, because it is hard to know when
   to delete the PVC. The PipelineRun can not be set as OwnerReference since
   the PVC must be created first

 This commit adds 'volumeClaimTemplate' as a volume source for workspaces. This
 has several advantages:

 - The syntax is used in k8s StatefulSet and other k8s projects so it is
   familiar in the kubernetes ecosystem
 - It is possible to declaratively declare that a PVC should be created for each
   PipelineRun, e.g. from a TriggerTemplate.
 - The user can choose storageClass (or omit to get the cluster default) to e.g.
   get a faster SSD volume, or to get a volume compatible with e.g. Windows.
 - The user can adapt the size to the job, e.g. use 5Gi for apps that contains
   machine learning models, or 1Gi for microservice apps. It can be changed on
   demand in a configuration that lives in the users namespace e.g. in a
   TriggerTemplate.
 - The size affects the storage quota that is set on the namespace and it may affect
   billing and cost depending on the cluster environment.
 - The PipelineRun or TaskRun with the template is created first, and is used
   as OwnerReference on the PVC. That means that the PVC will have the same lifecycle
   as the PipelineRun.

 Related to tektoncd#1986

 See also:
  - tektoncd#2174
  - tektoncd#2218
  - tektoncd/triggers#476
  - tektoncd/triggers#482
  - kubeflow/kfp-tekton#51
tekton-robot pushed a commit to tektoncd/pipeline that referenced this issue Apr 14, 2020
An existing PersistentVolumeClaim can currently be used as a Workspace
volume source. There is two ways of using an existing PVC as volume:

 - Reuse an existing PVC
 - Create a new PVC before each PipelineRun.

There is disadvantages by reusing the same PVC for every PipelineRun:

 - You need to clean the PVC at the end of the Pipeline
 - All Tasks using the workspace will be scheduled to the node where
   the PV is bound
 - Concurrent PipelineRuns may interfere, an artifact or file from one
   PipelineRun may slip in to or be used in another PipelineRun, with
   very few audit tracks.

There is also disadvantages by creating a new PVC before each PipelineRun:

 - This can not (easily) be done declaratively
 - This is hard to do programmatically, because it is hard to know when
   to delete the PVC. The PipelineRun can not be set as OwnerReference since
   the PVC must be created first

 This commit adds 'volumeClaimTemplate' as a volume source for workspaces. This
 has several advantages:

 - The syntax is used in k8s StatefulSet and other k8s projects so it is
   familiar in the kubernetes ecosystem
 - It is possible to declaratively declare that a PVC should be created for each
   PipelineRun, e.g. from a TriggerTemplate.
 - The user can choose storageClass (or omit to get the cluster default) to e.g.
   get a faster SSD volume, or to get a volume compatible with e.g. Windows.
 - The user can adapt the size to the job, e.g. use 5Gi for apps that contains
   machine learning models, or 1Gi for microservice apps. It can be changed on
   demand in a configuration that lives in the users namespace e.g. in a
   TriggerTemplate.
 - The size affects the storage quota that is set on the namespace and it may affect
   billing and cost depending on the cluster environment.
 - The PipelineRun or TaskRun with the template is created first, and is used
   as OwnerReference on the PVC. That means that the PVC will have the same lifecycle
   as the PipelineRun.

 Related to #1986

 See also:
  - #2174
  - #2218
  - tektoncd/triggers#476
  - tektoncd/triggers#482
  - kubeflow/kfp-tekton#51
gmfrasca pushed a commit to gmfrasca/data-science-pipelines-tekton that referenced this issue Oct 18, 2022
Fix: Fixing applying patch / CI failure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants