Map Tasks in new flytekit #405

katrogan · 2021-02-26T23:00:28Z

TL;DR

Re-architecting map tasks (array jobs) for new flytekit.

Instead of determining the array job node structure in a dynamic job spec produced at runtime, this change instead re-architects array task serialization to produce the array job custom at compile time. Execution behavior has also changed, so that individual array job instances now accept the entire input (collection), index into it appropriately and write a single output which the plugin then collects and coalesces into a collection. This output behavior is unchanged. The output interface is left as a collection specifically to support local workflow execution.

Type

Bug Fix
Feature
Plugin

Are all requirements met?

Complete description

How did you fix the bug, make the feature etc. Link to any design docs etc

Tracking Issue

flyteorg/flyte#609

Follow-up issue

NA

flytekit/core/map_task.py

kumare3 · 2021-02-27T00:51:11Z

flytekit/core/map_task.py

+        job runs in a number of slots less than the size of the input.
+        """
+        offset = 0
+        if os.environ.get("BATCH_JOB_ARRAY_INDEX_OFFSET"):


is this only for AWS Batch>?

if so can we add that to the docs?

nope! k8s too. see the double env var lookup below https://github.com/flyteorg/flytekit/pull/405/files#diff-38030c87fd7703a4c95a033167b2ce8efa1d3b61824e891eb063bb1337def271R88 this gets the appropriate index env var name (k8s or batch specific) and then looks that up

flytekit/core/map_task.py

kumare3 · 2021-02-27T00:54:58Z

Ohh Wow, this looks so much simple and nicer! Thank you so much

tests/flytekit/unit/core/test_type_hints.py

codecov-io · 2021-03-02T20:05:21Z

Codecov Report

Merging #405 (b9e4519) into master (c02075d) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #405   +/-   ##
=======================================
  Coverage   96.00%   96.00%           
=======================================
  Files           2        2           
  Lines          75       75           
  Branches        8        8           
=======================================
  Hits           72       72           
  Misses          1        1           
  Partials        2        2

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c02075d...c51fbc8. Read the comment docs.

flytekit/core/map_task.py

flytekit/common/tasks/task.py

flytekit/core/map_task.py

flytekit/models/array_job.py

flytekit/models/task.py

flytekit/core/map_task.py

tests/flytekit/unit/core/test_type_hints.py

flytekit/core/map_task.py

wild-endeavor · 2021-03-02T22:34:59Z

flytekit/core/map_task.py

+    of the mapped task.
+
+    :param task_function: This argument is implicitly passed and represents the repeatable function
+    :param concurrency: If specified, this limits the number of mapped tasks than can run in parallel to the given batch


propeller respects this?

no 😬 but it's in idl

wild-endeavor · 2021-03-02T23:04:40Z

tests/flytekit/unit/core/test_type_hints.py

        return t2(a=x, b=y)

    x = my_wf(a=[5, 6])
    assert x == (15, "world-7world-8")


+def test_map_task():


Can we add the following tests?

some serialization test something where we serialize a task, and look at it to make sure the task type is correct.

what does serialization look like if the user does

mapped_t1 = map(t1, metadata=TaskMetadata(retries=1)) @workflow def wf1(): return mapped_t1(a=a) @workflow def wf2(): map(t1, metadata=TaskMetadata(retries=1))(a=a)

there's basically two map tasks at this point right? coming from the same t1. At serialization, will this produce two protobufs? Will they collide? Are the names unique?

a serialization test for the workflow

test local workflow execution

also what happens if you map over a launch plan or a sub wf? should those error?

what happens if you map over a task that has multiple inputs?

what happens if you map over a task that has multiple outputs?

Also can you remind me what restrictions there are here? Can users do this?

@workflow def wf() -> (something, smth_else): results = map(t1, metadata=TaskMetadata(retries=1))(a=a) x = t2(list_of_ints=results[0:5]) y = t2(list_of_ints=results[5:]) return x, y

no right? What about in a dynamic task?

If we have a dynamic task that ranges over something, and there ends up being N copies of the same task, does the current implementation preclude future optimization of those N copies into one array task?

note this test case already tests test local workflow execution

Can users do this?

No this should be unchanged because the results are still a promise.

To answer your last question, we can certainly update dynamic tasks to produce array jobs (that was my initial approach). It's just mildly tedious to implement but certainly doable especially given that the old sdk just does this.

Added test cases for everything else above.

Signed-off-by: Katrina Rogan <[email protected]>

katrogan · 2021-03-03T20:30:25Z

PTAL @wild-endeavor @cosmicBboy

flytekit/bin/entrypoint.py

kumare3 · 2021-03-03T22:01:49Z

flytekit/bin/entrypoint.py

+    task_module = _importlib.import_module(task_module)
+    task_def = getattr(task_module, task_name)
+
+    if not test and isinstance(task_def, PythonFunctionTask):


does this also mean, we can now use Map task for everything? I dont think that will work. So for example PythonSparkFunctionTask us also derived from this. In the case we should just ensure this in the map-task compilation point

sure, that's already done. this check is defensive in case someone mucks with the container args

wild-endeavor

up to you on the map name

wild-endeavor · 2021-03-03T20:41:35Z

flytekit/__init__.py

@@ -119,7 +119,7 @@
 from flytekit.core.context_manager import ExecutionParameters, FlyteContext
 from flytekit.core.dynamic_workflow_task import dynamic
 from flytekit.core.launch_plan import LaunchPlan
-from flytekit.core.map_task import maptask
+from flytekit.core.map_task import map


I actually think we should rename it to map_task. The function as it stands doesn't map right? It produces a new task, that when () actually does the map.

flytekit/clis/sdk_in_container/serialize.py

wild-endeavor · 2021-03-03T22:49:59Z

flytekit/core/map_task.py

+            successfully before terminating this task and marking it successful.
+        """
+        if len(python_function_task.python_interface.inputs.keys()) > 1:
+            raise ValueError("Map tasks only accept python function tasks with 0 or 1 inputs")


Is this true? I think exactly 1 right? Does it work for 0

it's kinda stupid but yes it does

wild-endeavor · 2021-03-05T01:41:56Z

just one last comment then +1. but can you please document all the prefix/path/retry stuff too? 🙏

Signed-off-by: Katrina Rogan <[email protected]>

katrogan added 30 commits February 18, 2021 17:21

wip

2dd3f27

wip

295d085

wip

e54d32b

wip

e259e83

wip

353669c

wip

985b728

wip

e9210c1

wip

fb75c4a

full input path

4245ad2

log

15048e1

wip

21d05b9

wip

1015985

wip

f889075

wip

a1fde79

wip

9bfa080

wip

94902bb

wip

cac5685

wip

2f7afb0

wip

8160f94

wip

74a57b8

wip

38e7a88

wip

6610720

wip

e990627

wip

65db879

wip

623a195

wip

cb12cf3

wip

2541809

wip

00e7277

wip

7e3d188

wip

7aba0b5

kumare3 reviewed Feb 27, 2021

View reviewed changes

flytekit/core/map_task.py Outdated Show resolved Hide resolved

kumare3 reviewed Feb 27, 2021

View reviewed changes

flytekit/core/map_task.py Outdated Show resolved Hide resolved

wild-endeavor mentioned this pull request Mar 1, 2021

Map tasks flyteorg/flyte#609

Closed

kumare3 reviewed Mar 1, 2021

View reviewed changes

tests/flytekit/unit/core/test_type_hints.py Outdated Show resolved Hide resolved

katrogan added 2 commits March 1, 2021 17:06

comments

5baf1ec

additional test

c3e935e

katrogan added 2 commits March 2, 2021 13:18

more comments, rename

dab77b6

module level comments too

7c6091f

cosmicBboy reviewed Mar 2, 2021

View reviewed changes

flytekit/core/map_task.py Outdated Show resolved Hide resolved

cosmicBboy reviewed Mar 2, 2021

View reviewed changes

wild-endeavor reviewed Mar 2, 2021

View reviewed changes

katrogan added 3 commits March 3, 2021 11:55

Docstrings, comments, test cases

f169c01

Signed-off-by: Katrina Rogan <[email protected]>

derp, derp

620e24a

Signed-off-by: Katrina Rogan <[email protected]>

tests!

a08a0d2

Signed-off-by: Katrina Rogan <[email protected]>

cosmicBboy previously approved these changes Mar 3, 2021

View reviewed changes

kumare3 reviewed Mar 3, 2021

View reviewed changes

flytekit/bin/entrypoint.py Outdated Show resolved Hide resolved

kumare3 reviewed Mar 3, 2021

View reviewed changes

comments

c51fbc8

katrogan dismissed cosmicBboy’s stale review via c51fbc8 March 4, 2021 01:55

[ignore] - PR into #405 (#410)

623d791

wild-endeavor reviewed Mar 5, 2021

View reviewed changes

rename

b4bff0f

Signed-off-by: Katrina Rogan <[email protected]>

wild-endeavor approved these changes Mar 5, 2021

View reviewed changes

wild-endeavor merged commit 4d5105f into master Mar 5, 2021

max-hoffman pushed a commit to dolthub/flytekit that referenced this pull request May 11, 2021

Map Tasks in new flytekit (flyteorg#405)

f55e05c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Map Tasks in new flytekit #405

Map Tasks in new flytekit #405

katrogan commented Feb 26, 2021 •

edited

Loading

kumare3 Feb 27, 2021

kumare3 Feb 27, 2021

katrogan Mar 1, 2021

kumare3 commented Feb 27, 2021

codecov-io commented Mar 2, 2021 •

edited

Loading

wild-endeavor Mar 2, 2021

katrogan Mar 2, 2021

wild-endeavor Mar 2, 2021

katrogan Mar 3, 2021

katrogan Mar 3, 2021

katrogan commented Mar 3, 2021

kumare3 Mar 3, 2021

katrogan Mar 4, 2021

wild-endeavor left a comment

wild-endeavor Mar 3, 2021

katrogan Mar 5, 2021

wild-endeavor Mar 3, 2021

katrogan Mar 5, 2021

wild-endeavor commented Mar 5, 2021

Map Tasks in new flytekit #405

Map Tasks in new flytekit #405

Conversation

katrogan commented Feb 26, 2021 • edited Loading

TL;DR

Type

Are all requirements met?

Complete description

Tracking Issue

Follow-up issue

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kumare3 commented Feb 27, 2021

codecov-io commented Mar 2, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

katrogan commented Mar 3, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wild-endeavor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wild-endeavor commented Mar 5, 2021

katrogan commented Feb 26, 2021 •

edited

Loading

codecov-io commented Mar 2, 2021 •

edited

Loading