Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Validate multiple input values in map task #334

Merged
merged 2 commits into from
Apr 4, 2023

Conversation

hamersaw
Copy link
Contributor

TL;DR

This PR adds support for caching over map tasks with multiple inputs and validates (at runtime) that each of the lists is the same length.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

^^^

Tracking Issue

NA

Follow-up issue

NA

@codecov
Copy link

codecov bot commented Mar 23, 2023

Codecov Report

Merging #334 (2e69534) into master (60d345d) will increase coverage by 1.30%.
The diff coverage is 68.18%.

❗ Current head 2e69534 differs from pull request most recent head 0b3775c. Consider uploading reports for the commit 0b3775c to get more accurate results

@@            Coverage Diff             @@
##           master     #334      +/-   ##
==========================================
+ Coverage   62.75%   64.05%   +1.30%     
==========================================
  Files         146      146              
  Lines       12194     9890    -2304     
==========================================
- Hits         7652     6335    -1317     
+ Misses       3964     2976     -988     
- Partials      578      579       +1     
Flag Coverage Δ
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
go/tasks/plugins/array/catalog.go 46.98% <68.18%> (+1.93%) ⬆️

... and 129 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Signed-off-by: Daniel Rammer <[email protected]>
@hamersaw hamersaw marked this pull request as ready for review March 24, 2023 01:19
Copy link
Contributor

@EngHabu EngHabu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, we're assuming here that if the inputs have lists, then the target function will take single elements... Right?

  • what if the target function takes lists? Because they are considered static config values or something?
  • Should we check the target interface and only take items from the list if it takes single items?
  • I may have missed it (looking on my phone) do we need to update the compiler at all? (I don't think so)

@hamersaw
Copy link
Contributor Author

If I understand correctly, we're assuming here that if the inputs have lists, then the target function will take single elements... Right?

* what if the target function takes lists? Because they are considered static config values or something?

* Should we check the target interface and only take items from the list if it takes single items?

Yeah, thanks for catching this. I will double-check because I know the flytekit PR had some minor issues supporting partials. But we should probably do a check here if this provided value type is castable to a List of the expected type. This would cover the List[List[...]] issue as well.

* I may have missed it (looking on my phone) do we need to update the compiler at all? (I don't think so)

I do not believe so.

@eapolinario
Copy link
Contributor

All great questions, @EngHabu !

If I understand your question correctly, you're concerned about partial lists (i.e. tasks where we bound a parameter that's a list), correct? In that case, we made the decision to ship a version of partials that does not support that (in other words, you cannot have partial lists). We're making a change in flytekit to specifically catch that case.

We're going to invest in the flyteidl + backend changes to support that use-case in the future.

@EngHabu
Copy link
Contributor

EngHabu commented Mar 30, 2023

Thanks both for answering...
Just to be more concrete:

@task
def my_task(a: str, b: typing.List[str]) -> int:
  ...

Is any of these valid?

  1. map_task(my_task)(a=my_list, b=my_list)
  2. map_task(functools.partial(my_task, b=my_list))(a=my_list)

Then for this use case:

@task
def my_task(a: str, b: str) -> int:
  ...

Are these valid?

  1. map_task(my_task)(a=my_list, b=my_list)
  2. map_task(functools.partial(my_task, b="hello world"))(a=my_list)

@eapolinario
Copy link
Contributor

eapolinario commented Mar 30, 2023

Answers inline.

Thanks both for answering... Just to be more concrete:

@task
def my_task(a: str, b: typing.List[str]) -> int:
  ...

Is any of these valid?

  1. map_task(my_task)(a=my_list, b=my_list)
  2. map_task(functools.partial(my_task, b=my_list))(a=my_list)

case 1. is only going to work if the lists are of the same size.
case 2. is not going to work. We're going to explicitly disallow partials where any of the bounded parameters are of type list in flytekit.

Then for this use case:

@task
def my_task(a: str, b: str) -> int:
  ...

Are these valid?

  1. map_task(my_task)(a=my_list, b=my_list)
  2. map_task(functools.partial(my_task, b="hello world"))(a=my_list)

Both should work.

@EngHabu
Copy link
Contributor

EngHabu commented Mar 30, 2023

Just chatted offline with @eapolinario to understand more about the motivation for restricting partial lists...
I think we should support this: map_task(functools.partial(my_task, b=my_list))(a=my_list)
Which means we should understand if a given input's type == the input type of the subtask vs a Collection of its input type.. if it's the latter, then we continue as per code in this PR, if it's the former, we just assign the input as is... correct, @hamersaw ?

@hamersaw
Copy link
Contributor Author

hamersaw commented Apr 3, 2023

re-synced up on this. Currently a partial registers a new task with the expected List type inputs. This means that in the example:

@task
def generate(n: int) -> typing.List[int]:
   return list(range(n))
   
@task
def foo(i: int, j: int, k: typing.List[int]):
    # do something cool

@workflow
def wf(a: int, b: typing.List[int]):
    l = generate(n=a)
    p = functools.partial(foo, k=b)
    map_task(p)(i=l, j=l)

the maptask that is registered has the interface []int, []int, []int and the values that are passed as inputs are all types typing.List[int]. There is no way in propeller to differentiate which variables are partials and which are not - this will require flyteidl changes.

The decision in flytekit is to not allow lists to be passed as partials to functions (with this support added later). With this being said, I think this PR should be merged to add support for multiple lists in (1) validating equal length lists at runtime and (2) correctly setting cache keys.

Copy link
Contributor

@EngHabu EngHabu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense... Thank you

@hamersaw hamersaw merged commit e84d27a into master Apr 4, 2023
@hamersaw hamersaw deleted the feature/map-task-with-multiple-inputs branch April 4, 2023 17:35
eapolinario pushed a commit that referenced this pull request Sep 6, 2023
* validating multiple input values in map task and correctly populating for cache lookup

Signed-off-by: Daniel Rammer <[email protected]>

* added unit tests

Signed-off-by: Daniel Rammer <[email protected]>

---------

Signed-off-by: Daniel Rammer <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants