-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker plugin #1926
base: master
Are you sure you want to change the base?
Docker plugin #1926
Conversation
Signed-off-by: Danny Farrell <[email protected]>
Signed-off-by: Danny Farrell <[email protected]>
Signed-off-by: Danny Farrell <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1926 +/- ##
===========================================
- Coverage 54.71% 39.83% -14.89%
===========================================
Files 305 170 -135
Lines 22732 16649 -6083
Branches 3453 3451 -2
===========================================
- Hits 12438 6632 -5806
+ Misses 10122 9861 -261
+ Partials 172 156 -16
☔ View full report in Codecov by Sentry. |
Signed-off-by: Danny Farrell <[email protected]>
Signed-off-by: Danny Farrell <[email protected]>
Signed-off-by: Danny Farrell <[email protected]>
Signed-off-by: Danny Farrell <[email protected]>
Signed-off-by: Danny Farrell <[email protected]>
Signed-off-by: Danny Farrell <[email protected]>
if env_image_id: | ||
click.secho(f"Skipping nested build of {image_spec.name}", fg="blue") | ||
|
||
def horrible_horrible_hack() -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what?
|
||
biopython_img = ImageSpec( | ||
name="flytekit-biopython", | ||
builder="docker", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is pretty cool, just dont like the horrible_horrible_hack haha
Yea I just wanted to get something that works by touching the least amount possible. I have an idea of what needs to happen, but I wanted to get some feedback before I dumped a bunch more time into it or pushed the scope wider. I'll probably keep plugging away at this till I'm happy with it, but any direction / thoughts would be greatly appreciated so I don't spend too much time in the wrong places. Briefly:
|
ident = self.raw_register( | ||
cp_entity, | ||
settings=settings, | ||
version=version, | ||
version=c_version, | ||
create_default_launchplan=create_default_launchplan, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain why this is required?
My naive expectation after reading the PR description is that the plugin "should do thedocker build -t <some tag> .
part" that has to come before pyflyte run --image <some tag>
. Doing this manually wouldn't require modifying the version here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version hash is built based on the python code - which doesn't include the Dockerfile if it changes. This means that if you pyflyte run
the same workflow twice but the container hash changes due to a change in the dockerfile, you errors similar to this:
RPC Failed, with Status: StatusCode.INVALID_ARGUMENT
details: task with different structure already exists with id resource_type:TASK project:"flytesnacks" domain:"development" name:"example_pyflyte_script.t1" version:"LMxQheYT9myAmWklNHYD6A=="
```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version hash is built based on the python code
To make sure I understand, when a user executes pyflyte run
, the workflow etc. python code is hashed to get version
in the code snippet?
How does this look like when the user executes pyflyte register --version ...
? Then version
in the code snippet should have the user specified value?
Imagine there is a task and a workflow in a module and you pyflyte register --version v1
this module, would the user see a version v1
in the flyte console for the workflow and launch plan but a version v1-<some hash>
for the task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think workflow and task version will become different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hash is calculated here, we can add the dockerfile to the zip file. Hence, flytekit will register a new version of workflow once you update the dockerfile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a problem is that we would have to also hash the docker build context directory as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we could do it here by using the tag of the image as additional context?
@staticmethod
def _version_from_hash(
md5_bytes: bytes,
serialization_settings: SerializationSettings,
*additional_context: str,
) -> str:
"""
...
:param additional_context: This is for additional context to factor into the version computation,
meant for objects (like Options for instance) that don't easily consistently stringify.
"""
pyflyte run
calls register_script
, which under the hood does:
if version is None:
# The md5 version that we send to S3/GCS has to match the file contents exactly,
# but we don't have to use it when registering with the Flyte backend.
# For that add the hash of the compilation settings to hash of file
version = self._version_from_hash(md5_bytes, serialization_settings)
If the user on the other hand executes pyflyte register
, under the hood we do:
if not version and fast:
version = remote._version_from_hash(md5_bytes, serialization_settings, service_account, raw_data_prefix) # noqa
click.secho(f"Computed version is {version}", fg="yellow")
What do you think @pingsutw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tag isn't generated/parsed until the end of register_script (which uses the previously defined version to create some entities) I guess you could do it in 2 stages & with a junk version initially, and then change the version on the resulting entities?
build image backtrace
> /Users/dan.farrell/tmp/git/flytekit/flytekit/image_spec/image_spec.py(160)build()
-> if image_spec.builder not in cls._REGISTRY:
(Pdb) bt
/Users/dan.farrell/.local/share/micromamba/envs/tflyte-subwf-01/bin/pyflyte(8)<module>()
-> sys.exit(main())
/Users/dan.farrell/.local/share/micromamba/envs/tflyte-subwf-01/lib/python3.11/site-packages/click/core.py(1157)__call__()
-> return self.main(*args, **kwargs)
/Users/dan.farrell/.local/share/micromamba/envs/tflyte-subwf-01/lib/python3.11/site-packages/rich_click/rich_group.py(21)main()
-> rv = super().main(*args, standalone_mode=False, **kwargs)
/Users/dan.farrell/.local/share/micromamba/envs/tflyte-subwf-01/lib/python3.11/site-packages/click/core.py(1078)main()
-> rv = self.invoke(ctx)
/Users/dan.farrell/tmp/git/flytekit/flytekit/clis/sdk_in_container/utils.py(124)invoke()
-> return super().invoke(ctx)
/Users/dan.farrell/.local/share/micromamba/envs/tflyte-subwf-01/lib/python3.11/site-packages/click/core.py(1688)invoke()
-> return _process_result(sub_ctx.command.invoke(sub_ctx))
/Users/dan.farrell/.local/share/micromamba/envs/tflyte-subwf-01/lib/python3.11/site-packages/click/core.py(1688)invoke()
-> return _process_result(sub_ctx.command.invoke(sub_ctx))
/Users/dan.farrell/.local/share/micromamba/envs/tflyte-subwf-01/lib/python3.11/site-packages/click/core.py(1688)invoke()
-> return _process_result(sub_ctx.command.invoke(sub_ctx))
/Users/dan.farrell/.local/share/micromamba/envs/tflyte-subwf-01/lib/python3.11/site-packages/click/core.py(1434)invoke()
-> return ctx.invoke(self.callback, **ctx.params)
/Users/dan.farrell/.local/share/micromamba/envs/tflyte-subwf-01/lib/python3.11/site-packages/click/core.py(783)invoke()
-> return __callback(*args, **kwargs)
/Users/dan.farrell/tmp/git/flytekit/flytekit/clis/sdk_in_container/run.py(490)_run()
-> remote_entity = remote.register_script(
/Users/dan.farrell/tmp/git/flytekit/flytekit/remote/remote.py(932)register_script()
-> return self.register_task(entity, serialization_settings, version)
/Users/dan.farrell/tmp/git/flytekit/flytekit/remote/remote.py(735)register_task()
-> ident = self._serialize_and_register(entity=entity, settings=serialization_settings, version=version)
/Users/dan.farrell/tmp/git/flytekit/flytekit/remote/remote.py(691)_serialize_and_register()
-> _ = get_serializable(m, settings=serialization_settings, entity=entity, options=options)
/Users/dan.farrell/tmp/git/flytekit/flytekit/tools/translator.py(696)get_serializable()
-> cp_entity = get_serializable_task(settings, entity)
/Users/dan.farrell/tmp/git/flytekit/flytekit/tools/translator.py(180)get_serializable_task()
-> container = entity.get_container(settings)
/Users/dan.farrell/tmp/git/flytekit/flytekit/core/python_auto_container.py(177)get_container()
-> return self._get_container(settings)
/Users/dan.farrell/tmp/git/flytekit/flytekit/core/python_auto_container.py(188)_get_container()
-> image=get_registerable_container_image(self.container_image, settings.image_config),
/Users/dan.farrell/tmp/git/flytekit/flytekit/core/python_auto_container.py(257)get_registerable_container_image()
-> ImageBuildEngine.build(img)
> /Users/dan.farrell/tmp/git/flytekit/flytekit/image_spec/image_spec.py(160)build()
-> if image_spec.builder not in cls._REGISTRY:
(Pdb)
@@ -25,7 +25,7 @@ class ImageSpec: | |||
name: name of the image. | |||
python_version: python version of the image. Use default python in the base image if None. | |||
builder: Type of plugin to build the image. Use envd by default. | |||
source_root: source root of the image. | |||
source_root: source root of the image, or the dockerfile context path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For users not proficient with docker and/or envd it might be confusing that the kargs that are intended for envd and docker are mixed. In case additional builders are added in the future, this might get even worse.
I personally vote for refactoring this into a BaseImageSpec and then inheriting ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You wrote:
I'm not a huge fan of this solution, I think it would be nice to not have the plugins installed on the node
The current way is a bit too cluttered for me as it somewhat defeats the purpose of having the plugin abstraction.
I agree with it being to cluttered and defeating the purpose of having a plugin abstraction. Maybe the register
method of the ImageBuildEngine
should receive an accompanying ImageSpec
class for each ImageSpecBuilder
?
Could you please point me to where the ImageSpec
is needed at runtime? Maybe we can avoid this by running the relevant code only when the flyte context tells us we are in registration mode?
I think @pingsutw should have the ultimate say in this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, source root is always set by flytekit, and it's the first ancestor from source_path that does not contain a __init__.py
file. Do you want to set it by yourself?
ident = self.raw_register( | ||
cp_entity, | ||
settings=settings, | ||
version=version, | ||
version=c_version, | ||
create_default_launchplan=create_default_launchplan, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version hash is built based on the python code
To make sure I understand, when a user executes pyflyte run
, the workflow etc. python code is hashed to get version
in the code snippet?
How does this look like when the user executes pyflyte register --version ...
? Then version
in the code snippet should have the user specified value?
Imagine there is a task and a workflow in a module and you pyflyte register --version v1
this module, would the user see a version v1
in the flyte console for the workflow and launch plan but a version v1-<some hash>
for the task?
from flytekit import task | ||
from flytekit.image_spec import ImageSpec | ||
|
||
biopython_img = ImageSpec(builder="docker", dockerfile="docker/Dockerfile.biopython") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
biopython_img = ImageSpec(builder="docker", dockerfile="docker/Dockerfile.biopython") | |
img = ImageSpec(builder="docker", dockerfile="docker/Dockerfile") |
Would it be ok for you to keep the variable naming here generic?
The python docker api, aka [docker-py](https://github.com/docker/docker-py) is a bit behind the buildkit integration. See: https://github.com/docker/docker-py/issues/2230 | ||
|
||
|
||
For this reason, I decided to just use simple subprocess commands. There are alternatives such as [python-on-whales](https://github.com/gabrieldemarmiesse/python-on-whales), but our complexity would have to be significantly larger to justify adding it as a dependency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this reason, I decided to just use simple subprocess commands. There are alternatives such as [python-on-whales](https://github.com/gabrieldemarmiesse/python-on-whales), but our complexity would have to be significantly larger to justify adding it as a dependency. | |
For this reason, this plugin currently just uses simple subprocess commands. There are alternatives such as [python-on-whales](https://github.com/gabrieldemarmiesse/python-on-whales), but the complexity would have to be significantly larger to justify adding it as a dependency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might better be placed in image_builder.py
in a doc string.
image_spec.image_name = horrible_horrible_hack | ||
return | ||
|
||
# Inject _F_IMG_ID to stop recursive builds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the recursion come from?
Does it try to build the image again when the task starts to actually run in the cluster? If yes, could we solve this by figuring out from the flyte context whether we currently are in registration mode or execution mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You say:
horrible_horrible_hack is because the Dockerfile isn't packaged with the script and sent to the remote. So when the remote calls .build_image and tries to determine a hash, it either crashes (folder doesn't exist) or it determines a new one and thus returns a new image_name. This causes the builder to try and build the image again and then crash because docker doesn't exist.
With when the remote calls .build_image and tries to determine a hash
, do you mean remote
as in this happens in the task pod?
For me remote
is the python client for flyteadmin
, maybe this is why I'm confused?
image_spec.source_root, | ||
] + image_spec.docker_build_extra_args | ||
output_image_params = f"type=image,name={image_spec.image_name()}" | ||
if image_spec.registry: | ||
output_image_params = f"{output_image_params},push=true" | ||
if image_spec.buildkit_build_extra_output: | ||
output_image_params = f"{output_image_params},{image_spec.buildkit_build_extra_output}" | ||
command += [ | ||
"--output", | ||
output_image_params, | ||
] | ||
self.execute_command(command) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a user, do I put the -t, --tag
of docker build
into the output_image_params
or the buildkit_build_extra_output
? I feel this might deserve its own arg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or don't users pass the tag to ImageSpec
at all and instead its calculated with calculate_hash_from_image_spec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the --tag is set via output_image_params = f"type=image,name={image_spec.image_name()}"
ie via:calculate_hash_from_image_spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be more explicit - the user does not put the tag it is generated. you could add more tags via -t, but not sure how that works with buildx export actions.
@@ -0,0 +1,26 @@ | |||
from ghcr.io/flyteorg/flytekit:py3.10-1.10.0 | |||
ARG HWARG=EARTH | |||
RUN pip install --no-cache-dir biopython==1.81 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this dependency required for the test to work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could use anything. flytekit by default has a ton already installed, so I just picked something that I think would never ever be installed by default hah
builder="docker", | ||
dockerfile=str(DOCKERFILE_DIR / "Dockerfile.biopython"), | ||
source_root=str(DOCKERFILE_DIR), | ||
registry="dancyrusbio", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe anonymise or use a flyte registry before merging.
# You cannot use workflow, because it uses the base image and the base image | ||
# does not have flytekit-docker installed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain this limitation, I unfortunately don't understand where it comes from.
Generally, static workflows come with more guarantees than dynamic workflows which is why we shouldn't require users to always use them if they want to use the docker builder.
The workflow doesn't execute in an image at runtime at all meaning that the dependencies that are required to make it run don't have to be installed in the task image but in the environment from which the workflow is registered.
@@ -274,9 +274,9 @@ def test_list_default_arguments(wf_path): | |||
) | |||
|
|||
ic_result_4 = ImageConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question:
Are these tags calculated by calculate_hash_from_image_spec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it changed because the interface changed (dockerfile... etc)
builder="docker", | ||
dockerfile=str(dockerfile_path), | ||
source_root=str(dockerfile_path.parent), | ||
docker_build_extra_args=["--build-arg", "HWARRG=WORLD"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we call it build-args
here? I think envd can support build args in the future.
name="test-biopython", | ||
builder="docker", | ||
dockerfile=str(dockerfile_path), | ||
source_root=str(dockerfile_path.parent), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source_root
value is added by flytekit while compiling. users don't need to specify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this current implementation I am using source_root
to define the docker build context. I think this implementation is clearly confusing though. When/if we move it to flytekit/core I think we will have separate ImageSpec/ImageSpecBuilders for each builder, and then I won't have to overload arguments like this
registry_config: Specify the path to an envD JSON registry config file | ||
|
||
dockerfile: Specify the path to a Dockerfile | ||
buildkit_build_extra_output: Any custom arguments for the docker buildx build --output argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't use this before. could you share an example? I think envd has this as well, they use buildkit golang api under the hood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, It's possible to use buildkit_build_extra_output
and build_args
, but not dockerfile. Could we raise an exception if people specify the dockerfile when they use envd plugin?
@@ -25,7 +25,7 @@ class ImageSpec: | |||
name: name of the image. | |||
python_version: python version of the image. Use default python in the base image if None. | |||
builder: Type of plugin to build the image. Use envd by default. | |||
source_root: source root of the image. | |||
source_root: source root of the image, or the dockerfile context path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source_root: source root of the image, or the dockerfile context path. | |
source_root: source root of the image, or the build context path. |
@@ -25,7 +25,7 @@ class ImageSpec: | |||
name: name of the image. | |||
python_version: python version of the image. Use default python in the base image if None. | |||
builder: Type of plugin to build the image. Use envd by default. | |||
source_root: source root of the image. | |||
source_root: source root of the image, or the dockerfile context path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, source root is always set by flytekit, and it's the first ancestor from source_path that does not contain a __init__.py
file. Do you want to set it by yourself?
ident = self.raw_register( | ||
cp_entity, | ||
settings=settings, | ||
version=version, | ||
version=c_version, | ||
create_default_launchplan=create_default_launchplan, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think workflow and task version will become different
ident = self.raw_register( | ||
cp_entity, | ||
settings=settings, | ||
version=version, | ||
version=c_version, | ||
create_default_launchplan=create_default_launchplan, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hash is calculated here, we can add the dockerfile to the zip file. Hence, flytekit will register a new version of workflow once you update the dockerfile.
def build_image(self, image_spec: ImageSpec): | ||
if image_spec.dockerfile is None: | ||
raise RuntimeError("Image spec dockerfile cannot be None") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me fix the dynamic task issue first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed it here. #1967
TL;DR
Add a plugin to build imagespec using a docker buildx driver/builder
Type
Are all requirements met?
Complete description
This is a quick pass at a docker / buildkit builder for flytekit. Basically, this plugin provides an alternative interface to ImageSpec which will allow the usage of preexisting Dockerfiles that may be too complex to recreate with envd.
ImageSpec itself seems very tailored to envd, so I'm not sure if this is the best way to go about adding this functionality.
Notable changes
plugins/flytekit-docker/flytekitplugins/docker/image_builder.py
Because of this, It will likely require a deeper discussion with the team before merging