Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map over notebook task #1650

Merged
merged 13 commits into from
May 24, 2023
Merged

Map over notebook task #1650

merged 13 commits into from
May 24, 2023

Conversation

pingsutw
Copy link
Member

TL;DR

This PR allows map over notebook task

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

import os
import pathlib
import typing

from flytekit import kwtypes, task, workflow, ImageSpec, map_task
from flytekitplugins.papermill import NotebookTask

new_flytekit = "git+https://github.com/flyteorg/flytekit@022ade936d2fe1eb6a2d705c319cce7d2c66ade6"
new_papermill = "git+https://github.com/flyteorg/flytekit.git@022ade936d2fe1eb6a2d705c319cce7d2c66ade6#subdirectory=plugins/flytekit-papermill"
image = ImageSpec(registry="pingsutw", packages=[new_papermill, new_flytekit], apt_packages=["git"])

nb = NotebookTask(
    name="simple-nb",
    notebook_path=os.path.join(
        pathlib.Path(__file__).parent.absolute(), "nb-simple.ipynb"
    ),
    render_deck=True,
    inputs=kwtypes(a=float),
    outputs=kwtypes(square=float),
    container_image=image,
)


@workflow
def wf(a: float) -> typing.List[float]:
    return map_task(nb)(a=[a])


if __name__ == "__main__":
    print(wf(a=3.14))
image

Tracking Issue

https://flyte-org.slack.com/archives/CP2HDHKE1/p1684471512753359

Follow-up issue

NA

pingsutw added 5 commits May 18, 2023 22:43
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
@pingsutw pingsutw marked this pull request as draft May 19, 2023 06:30
@@ -64,7 +64,7 @@ def __init__(
else:
actual_task = python_function_task

if not isinstance(actual_task, PythonFunctionTask):
if not isinstance(actual_task, PythonTask) or not issubclass(type(actual_task), PythonInstanceTask):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the second check instance is subclass of pythontask

@@ -76,7 +76,11 @@ def __init__(

collection_interface = transform_interface_to_list_interface(actual_task.python_interface, self._bound_inputs)
self._run_task: PythonFunctionTask = actual_task
_, mod, f, _ = tracker.extract_task_module(actual_task.task_function)
if hasattr(actual_task, "_IMPLICIT_OP_NOTEBOOK_TYPE"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to special for notebook- we should make it for instance tasks in general

@@ -277,16 +270,14 @@ def execute(self, **kwargs) -> Any:
output_list = []

for k, type_v in self.python_interface.outputs.items():
if k == self._IMPLICIT_OP_NOTEBOOK:
output_list.append(self.output_notebook_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you just removing these?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @rahul-theorem I know you're using papermill plugin, do you use these paths in the output?

Signed-off-by: Kevin Su <[email protected]>
@codecov
Copy link

codecov bot commented May 19, 2023

Codecov Report

Merging #1650 (ab0375a) into master (ba70f46) will decrease coverage by 0.01%.
The diff coverage is 25.00%.

@@            Coverage Diff             @@
##           master    #1650      +/-   ##
==========================================
- Coverage   71.02%   71.01%   -0.01%     
==========================================
  Files         336      336              
  Lines       30724    30743      +19     
  Branches     5567     5568       +1     
==========================================
+ Hits        21821    21832      +11     
- Misses       8360     8365       +5     
- Partials      543      546       +3     
Impacted Files Coverage Δ
flytekit/core/map_task.py 48.07% <25.00%> (-1.60%) ⬇️

... and 10 files with indirect coverage changes

pingsutw added 4 commits May 19, 2023 12:47
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Copy link
Contributor

@wild-endeavor wild-endeavor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only thing is that this will change the interface for the user right? we will need to add a disclaimer to the release notes.

@@ -165,13 +166,14 @@ def __init__(
if not os.path.exists(self._notebook_path):
raise ValueError(f"Illegal notebook path passed in {self._notebook_path}")

if outputs:
if outputs and output_notebooks:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if outputs and output_notebooks:
if output_notebooks:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if you just want a notebook? don't want to have to make a fake output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated it

@@ -287,6 +289,8 @@ def execute(self, **kwargs) -> Any:
else:
raise TypeError(f"Expected output {k} of type {type_v} not found in the notebook outputs")

if len(output_list) == 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if we don't do this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be a mismatch between the output type and the downstream task's input type.

@@ -76,7 +76,11 @@ def __init__(

collection_interface = transform_interface_to_list_interface(actual_task.python_interface, self._bound_inputs)
self._run_task: PythonFunctionTask = actual_task
_, mod, f, _ = tracker.extract_task_module(actual_task.task_function)
if issubclass(type(actual_task), PythonInstanceTask):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a difference between this and isinstance(actual_task, PythonInstanceTask)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated it to isinstance(actual_task, PythonInstanceTask)

@@ -76,7 +76,11 @@ def __init__(

collection_interface = transform_interface_to_list_interface(actual_task.python_interface, self._bound_inputs)
self._run_task: PythonFunctionTask = actual_task
_, mod, f, _ = tracker.extract_task_module(actual_task.task_function)
if issubclass(type(actual_task), PythonInstanceTask):
mod = actual_task.task_type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be actual_task.instantiated_in? what happens if there are two of the same notebook tasks, named the same, with the same interface, but in two different .py files? will there be confusion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used lhs instead

pingsutw added 3 commits May 19, 2023 16:05
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
@pingsutw pingsutw marked this pull request as ready for review May 20, 2023 06:49
@@ -165,13 +166,16 @@ def __init__(
if not os.path.exists(self._notebook_path):
raise ValueError(f"Illegal notebook path passed in {self._notebook_path}")

if outputs:
if output_notebooks:
if outputs is None:
Copy link
Contributor

@wild-endeavor wild-endeavor May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can do outputs = outputs or {}

@pingsutw pingsutw merged commit a21dd46 into master May 24, 2023
ArthurBook pushed a commit to ArthurBook/flytekit that referenced this pull request May 26, 2023
* map over notebook

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* tests

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* add a flag

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* fix tests

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

* Fix tests

Signed-off-by: Kevin Su <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

---------

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Arthur <[email protected]>
ArthurBook added a commit to ArthurBook/flytekit that referenced this pull request Jun 12, 2023
eapolinario pushed a commit that referenced this pull request Jul 10, 2023
* map over notebook

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* tests

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* add a flag

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* fix tests

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

* Fix tests

Signed-off-by: Kevin Su <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

---------

Signed-off-by: Kevin Su <[email protected]>
eapolinario added a commit that referenced this pull request Jul 12, 2023
* Multi arch imageSpec (#1630)

Multi arch imageSpec (#1630)

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add executor_path and applications_path to spark config (#1634)

* Add executor_path and applications_path to spark config

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

---------

Signed-off-by: Kevin Su <[email protected]>

* Add support for env vars to pyflyte run (#1617)

* Add support for env vars to pyflyte run

Signed-off-by: Kevin Su <[email protected]>

* bump idl

Signed-off-by: Kevin Su <[email protected]>

* update doc

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

---------

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>

* Fetch task executions in dynamic  (#1636)

* fetch task executions in dynamic

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

---------

Signed-off-by: Kevin Su <[email protected]>

* Added metrics command to pyflyte (#1513)

Signed-off-by: Daniel Rammer <[email protected]>

* Add http_proxy to client & Fix deviceflow (#1611)

* Add http_proxy to client & Fix deviceflow

RB=3890720

Signed-off-by: byhsu <[email protected]>

* nit

Signed-off-by: byhsu <[email protected]>

* lint!

Signed-off-by: byhsu <[email protected]>

---------

Signed-off-by: byhsu <[email protected]>
Co-authored-by: byhsu <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>

* Improve variable names (#1642)

Signed-off-by: byhsu <[email protected]>
Co-authored-by: byhsu <[email protected]>

* Address resolution (#1567)

Signed-off-by: Yee Hing Tong <[email protected]>

* pyflyte run supports pickle (#1646)

Signed-off-by: Kevin Su <[email protected]>

* Wait for the pod plugin instead of flytekit (#1647)

Signed-off-by: eduardo apolinario <[email protected]>
Co-authored-by: eduardo apolinario <[email protected]>

* Beautify deviceflow prompt (#1625)

* Beautify deviceflow prompt

Signed-off-by: byhsu <[email protected]>

* lint!

Signed-off-by: byhsu <[email protected]>

* lint

Signed-off-by: byhsu <[email protected]>

---------

Signed-off-by: byhsu <[email protected]>
Co-authored-by: byhsu <[email protected]>

* Improve flytekit register (#1643)

* Fix pyflyte register

Signed-off-by: byhsu <[email protected]>

* revert

Signed-off-by: byhsu <[email protected]>

* lint

Signed-off-by: byhsu <[email protected]>

---------

Signed-off-by: byhsu <[email protected]>
Co-authored-by: byhsu <[email protected]>

* Pass verify flag to all authenticators (#1641)

Signed-off-by: byhsu <[email protected]>

* Allow annotated FlyteFile as task input argument (#1632)

* fix: Allow annotated FlyteFile as task input argument

Using an annotated FlyteFile type as an input to a task was previously impossible due
to an exception being raised in `FlyteFilePathTransformer.to_python_value`.

This commit applies the fix previously used in `FlyteFilePathTransformer.to_literal`
to permit using annotated FlyteFiles as either inputs and outputs of a task.

Issue: #3424
Signed-off-by: Adrian Rumpold <[email protected]>

* refactor: Unified handling of annotated types in type engine

Issue: #3424
Signed-off-by: Adrian Rumpold <[email protected]>

* fix: Use py3.8-compatible types in type engine tests

Issue: #3424
Signed-off-by: Adrian Rumpold <[email protected]>

---------

Signed-off-by: Adrian Rumpold <[email protected]>

* Use logger instead of print statement in sqlalchemy plugin (#1651)

* use logging info instead of print

Signed-off-by: wirthual <[email protected]>

* isorted files

Signed-off-by: wirthual <[email protected]>

* import root logger from flytekit

Signed-off-by: wirthual <[email protected]>

---------

Signed-off-by: wirthual <[email protected]>

* Map over notebook task (#1650)

* map over notebook

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* tests

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* add a flag

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* fix tests

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

* Fix tests

Signed-off-by: Kevin Su <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

---------

Signed-off-by: Kevin Su <[email protected]>

* Support single literals in tiny url (#1654)

Signed-off-by: Yee Hing Tong <[email protected]>

* Add support overriding image (#1652)

Signed-off-by: Kevin Su <[email protected]>

* Fix ability to pass None to task with Optional kwarg, add test (#1657)

Signed-off-by: Fabio Grätz <[email protected]>
Co-authored-by: Fabio Grätz <[email protected]>

* Regenerate plugins requirements

Signed-off-by: eduardo apolinario <[email protected]>

* Regenerate plugins requirements and linting

Signed-off-by: eduardo apolinario <[email protected]>

* Regenerate whylogs requirements

Signed-off-by: eduardo apolinario <[email protected]>

---------

Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: byhsu <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: eduardo apolinario <[email protected]>
Signed-off-by: Adrian Rumpold <[email protected]>
Signed-off-by: wirthual <[email protected]>
Signed-off-by: Fabio Grätz <[email protected]>
Co-authored-by: Kevin Su <[email protected]>
Co-authored-by: Dan Rammer <[email protected]>
Co-authored-by: ByronHsu <[email protected]>
Co-authored-by: byhsu <[email protected]>
Co-authored-by: Yee Hing Tong <[email protected]>
Co-authored-by: eduardo apolinario <[email protected]>
Co-authored-by: Adrian Rumpold <[email protected]>
Co-authored-by: wirthual <[email protected]>
Co-authored-by: Fabio M. Graetz, Ph.D <[email protected]>
Co-authored-by: Fabio Grätz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants