Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flytekit] Support extra copy commands in ImageSpec #2715

Merged
merged 30 commits into from
Sep 25, 2024

Conversation

mao3267
Copy link
Contributor

@mao3267 mao3267 commented Aug 27, 2024

Tracking issue

Initially, we aimed to support adding custom Docker commands when creating images with ImageSpec. However, after discussions with @thomasjpfan and @pingsutw in #2676 , we prioritized supporting additional COPY commands first, as they were deemed more important.

Why are the changes needed?

When creating images with ImageSpec on a remote cluster, we may need to copy additional files or directories required during the container build process. To address this, we aim to support extra copy commands in ImageSpec for copying files and directories into the container’s /root directory.

What changes were proposed in this pull request?

  1. Add the ImageSpec.with_copy(src) function to the ImageSpec class, allowing the inclusion of additional copy commands. The src parameter can be a single path string or a list of path-like strings, specifying files or directories to be copied.
  • Example Usage:

input.json

{
    "b": "Hello"
}

build_image.py

from flytekit.image_spec import ImageSpec
from flytekit import task, workflow
import json

image_spec = ImageSpec(
    name="test_image",
    registry="localhost:30000",
    builder="default",
    apt_packages=["git", "gh"],
    # For remote testing before merged
    env={"GH_TOKEN": "<YOUR_GITHUB_TOKEN>"},
    commands=[
        "git clone https://github.com/flyteorg/flytekit.git", 
        "cd flytekit", 
        "gh pr checkout 2715", 
        "pip install -e ."
    ]
)

image_spec.with_copy("local_test/input.json")
# image_spec.with_copy(["local_test/input.json", "local_test/input.yaml"]

@task(container_image=image_spec)
def my_task() -> str:
    with open("/root/local_test/input.json", "r") as f:
        data = json.load(f)
        return data["b"]


@workflow
def my_wf() -> str:
    return my_task()
  1. Support copying files or directories into /root in the container by setting ImageSpec.copy, a list of path strings.
  • Example usage:

build_image.py

from flytekit.image_spec import ImageSpec
from flytekit import task, workflow
import json

image_spec = ImageSpec(
    name="test_image",
    registry="localhost:30000",
    builder="default",
    apt_packages=["git", "gh"],
    copy=["local_test/input.json"],
    # For remote testing before merged
    env={"GH_TOKEN": "<YOUR_GITHUB_TOKEN>"},
    commands=[
        "git clone https://github.com/flyteorg/flytekit.git", 
        "cd flytekit", 
        "gh pr checkout 2715", 
        "pip install -e ."
    ]
)


@task(container_image=image_spec)
def my_task() -> str:
    with open("/root/local_test/input.json", "r") as f:
        data = json.load(f)
        return data["b"]


@workflow
def my_wf() -> str:
    return my_task()
  1. Support computing digest from a list of files by modifying compute_digest function in fast_registration.py.

How was this patch tested?

  1. In test_image_spec.py, test ImageSpec.copy argument and ImageSpec.with_copy() function with multiple files/directories to copy.
  2. In test_default_builder.py, test building docker context and image with extra copy commands.
  3. Run on remote cluster and successfully copied files to the container /root.

Setup process

git clone https://github.com/mao3267/flytekit.git
git checkout add-copy-commands
make setup && pip install -e .

Screenshots

The results of running pyflyte run --remote build_image.py my_wf are shown below.

  • Flyte console
image
  • Terminal
image

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

#2676

Docs link

TODO

flytekit/image_spec/image_spec.py Outdated Show resolved Hide resolved
flytekit/image_spec/default_builder.py Outdated Show resolved Hide resolved
flytekit/image_spec/image_spec.py Outdated Show resolved Hide resolved
Copy link
Contributor

@wild-endeavor wild-endeavor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you need to update the compute_digest call the new tag property to take into account changes in these newly copied files?

@mao3267 mao3267 changed the title Support extra copy commands in ImageSpec [WIP] Support extra copy commands in ImageSpec Sep 12, 2024
Signed-off-by: mao3267 <[email protected]>
@mao3267 mao3267 changed the title [WIP] Support extra copy commands in ImageSpec Support extra copy commands in ImageSpec Sep 14, 2024
@mao3267 mao3267 changed the title Support extra copy commands in ImageSpec [WIP] Support extra copy commands in ImageSpec Sep 17, 2024
Copy link

codecov bot commented Sep 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.95%. Comparing base (9bce7c3) to head (4e347b9).
Report is 6 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2715       +/-   ##
===========================================
+ Coverage   80.20%   90.95%   +10.75%     
===========================================
  Files         237       73      -164     
  Lines       21682     3306    -18376     
  Branches     4130        0     -4130     
===========================================
- Hits        17389     3007    -14382     
+ Misses       3576      299     -3277     
+ Partials      717        0      -717     
Flag Coverage Δ
90.95% <ø> (+12.31%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mao3267 mao3267 changed the title [WIP] Support extra copy commands in ImageSpec [Flytekit] Support extra copy commands in ImageSpec Sep 20, 2024
@mao3267
Copy link
Contributor Author

mao3267 commented Sep 21, 2024

Is raising an error the expected behavior for files or directories with absolute paths or ../ in their paths?

@pingsutw
Copy link
Member

pingsutw commented Sep 24, 2024

Is raising an error the expected behavior for files or directories with absolute paths or ../ in their paths?

Yup, I think it makes sense since docker also doesn't support it.

mao3267 and others added 5 commits September 25, 2024 14:39
Copy link
Member

@pingsutw pingsutw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @mao3267. Mind also updating the envd image builder to support copy in the follow-up PR

@pingsutw pingsutw merged commit 4e1ea68 into flyteorg:master Sep 25, 2024
104 checks passed
@mao3267
Copy link
Contributor Author

mao3267 commented Sep 26, 2024

LGTM, thank you @mao3267. Mind also updating the envd image builder to support copy in the follow-up PR

Will work on it. Really appreciate your help!

otarabai pushed a commit to otarabai/flytekit that referenced this pull request Oct 15, 2024
kumare3 pushed a commit that referenced this pull request Nov 8, 2024
Signed-off-by: mao3267 <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Co-authored-by: Kevin Su <[email protected]>
@mao3267 mao3267 deleted the add-copy-command branch November 11, 2024 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants