Validate multiple input values in map task #334

hamersaw · 2023-03-23T18:38:46Z

TL;DR

This PR adds support for caching over map tasks with multiple inputs and validates (at runtime) that each of the lists is the same length.

Type

Bug Fix
Feature
Plugin

Are all requirements met?

Complete description

^^^

Tracking Issue

NA

Follow-up issue

NA

… for cache lookup Signed-off-by: Daniel Rammer <[email protected]>

codecov · 2023-03-23T18:47:24Z

Codecov Report

Merging #334 (2e69534) into master (60d345d) will increase coverage by 1.30%.
The diff coverage is 68.18%.

❗ Current head 2e69534 differs from pull request most recent head 0b3775c. Consider uploading reports for the commit 0b3775c to get more accurate results

@@            Coverage Diff             @@
##           master     #334      +/-   ##
==========================================
+ Coverage   62.75%   64.05%   +1.30%     
==========================================
  Files         146      146              
  Lines       12194     9890    -2304     
==========================================
- Hits         7652     6335    -1317     
+ Misses       3964     2976     -988     
- Partials      578      579       +1

Flag	Coverage Δ
unittests	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
go/tasks/plugins/array/catalog.go	`46.98% <68.18%> (+1.93%)`	⬆️

... and 129 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Signed-off-by: Daniel Rammer <[email protected]>

EngHabu

If I understand correctly, we're assuming here that if the inputs have lists, then the target function will take single elements... Right?

what if the target function takes lists? Because they are considered static config values or something?
Should we check the target interface and only take items from the list if it takes single items?
I may have missed it (looking on my phone) do we need to update the compiler at all? (I don't think so)

hamersaw · 2023-03-30T17:40:05Z

If I understand correctly, we're assuming here that if the inputs have lists, then the target function will take single elements... Right?
* what if the target function takes lists? Because they are considered static config values or something?

* Should we check the target interface and only take items from the list if it takes single items?

Yeah, thanks for catching this. I will double-check because I know the flytekit PR had some minor issues supporting partials. But we should probably do a check here if this provided value type is castable to a List of the expected type. This would cover the List[List[...]] issue as well.

* I may have missed it (looking on my phone) do we need to update the compiler at all? (I don't think so)

I do not believe so.

eapolinario · 2023-03-30T17:57:30Z

All great questions, @EngHabu !

If I understand your question correctly, you're concerned about partial lists (i.e. tasks where we bound a parameter that's a list), correct? In that case, we made the decision to ship a version of partials that does not support that (in other words, you cannot have partial lists). We're making a change in flytekit to specifically catch that case.

We're going to invest in the flyteidl + backend changes to support that use-case in the future.

EngHabu · 2023-03-30T18:43:37Z

Thanks both for answering...
Just to be more concrete:

@task
def my_task(a: str, b: typing.List[str]) -> int:
  ...

Is any of these valid?

map_task(my_task)(a=my_list, b=my_list)
map_task(functools.partial(my_task, b=my_list))(a=my_list)

Then for this use case:

@task
def my_task(a: str, b: str) -> int:
  ...

Are these valid?

map_task(my_task)(a=my_list, b=my_list)
map_task(functools.partial(my_task, b="hello world"))(a=my_list)

eapolinario · 2023-03-30T20:56:16Z

Answers inline.

Thanks both for answering... Just to be more concrete:
@task
def my_task(a: str, b: typing.List[str]) -> int:
  ...
Is any of these valid?

map_task(my_task)(a=my_list, b=my_list)

map_task(functools.partial(my_task, b=my_list))(a=my_list)

case 1. is only going to work if the lists are of the same size.
case 2. is not going to work. We're going to explicitly disallow partials where any of the bounded parameters are of type list in flytekit.

Then for this use case:
@task
def my_task(a: str, b: str) -> int:
  ...
Are these valid?

map_task(my_task)(a=my_list, b=my_list)

map_task(functools.partial(my_task, b="hello world"))(a=my_list)

Both should work.

EngHabu · 2023-03-30T21:48:56Z

Just chatted offline with @eapolinario to understand more about the motivation for restricting partial lists...
I think we should support this: map_task(functools.partial(my_task, b=my_list))(a=my_list)
Which means we should understand if a given input's type == the input type of the subtask vs a Collection of its input type.. if it's the latter, then we continue as per code in this PR, if it's the former, we just assign the input as is... correct, @hamersaw ?

hamersaw · 2023-04-03T18:27:46Z

re-synced up on this. Currently a partial registers a new task with the expected List type inputs. This means that in the example:

@task
def generate(n: int) -> typing.List[int]:
   return list(range(n))
   
@task
def foo(i: int, j: int, k: typing.List[int]):
    # do something cool

@workflow
def wf(a: int, b: typing.List[int]):
    l = generate(n=a)
    p = functools.partial(foo, k=b)
    map_task(p)(i=l, j=l)

the maptask that is registered has the interface []int, []int, []int and the values that are passed as inputs are all types typing.List[int]. There is no way in propeller to differentiate which variables are partials and which are not - this will require flyteidl changes.

The decision in flytekit is to not allow lists to be passed as partials to functions (with this support added later). With this being said, I think this PR should be merged to add support for multiple lists in (1) validating equal length lists at runtime and (2) correctly setting cache keys.

EngHabu

Makes sense... Thank you

* validating multiple input values in map task and correctly populating for cache lookup Signed-off-by: Daniel Rammer <[email protected]> * added unit tests Signed-off-by: Daniel Rammer <[email protected]> --------- Signed-off-by: Daniel Rammer <[email protected]>

validating multiple input values in map task and correctly populating…

8e5c000

… for cache lookup Signed-off-by: Daniel Rammer <[email protected]>

added unit tests

0b3775c

Signed-off-by: Daniel Rammer <[email protected]>

hamersaw marked this pull request as ready for review March 24, 2023 01:19

hamersaw requested review from kumare3, eapolinario and wild-endeavor March 24, 2023 01:19

EngHabu reviewed Mar 30, 2023

View reviewed changes

eapolinario approved these changes Apr 3, 2023

View reviewed changes

EngHabu approved these changes Apr 4, 2023

View reviewed changes

hamersaw merged commit e84d27a into master Apr 4, 2023

hamersaw deleted the feature/map-task-with-multiple-inputs branch April 4, 2023 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate multiple input values in map task #334

Validate multiple input values in map task #334

hamersaw commented Mar 23, 2023

codecov bot commented Mar 23, 2023 •

edited

Loading

EngHabu left a comment

hamersaw commented Mar 30, 2023

eapolinario commented Mar 30, 2023

EngHabu commented Mar 30, 2023 •

edited

Loading

eapolinario commented Mar 30, 2023 •

edited by EngHabu

Loading

EngHabu commented Mar 30, 2023

hamersaw commented Apr 3, 2023

EngHabu left a comment

Validate multiple input values in map task #334

Validate multiple input values in map task #334

Conversation

hamersaw commented Mar 23, 2023

TL;DR

Type

Are all requirements met?

Complete description

Tracking Issue

Follow-up issue

codecov bot commented Mar 23, 2023 • edited Loading

Codecov Report

EngHabu left a comment

Choose a reason for hiding this comment

hamersaw commented Mar 30, 2023

eapolinario commented Mar 30, 2023

EngHabu commented Mar 30, 2023 • edited Loading

eapolinario commented Mar 30, 2023 • edited by EngHabu Loading

EngHabu commented Mar 30, 2023

hamersaw commented Apr 3, 2023

EngHabu left a comment

Choose a reason for hiding this comment

codecov bot commented Mar 23, 2023 •

edited

Loading

EngHabu commented Mar 30, 2023 •

edited

Loading

eapolinario commented Mar 30, 2023 •

edited by EngHabu

Loading