Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][flyteadmin] Make New DC and Old DC Default Inputs Compatible #5991

Closed

Conversation

Future-Outlier
Copy link
Member

@Future-Outlier Future-Outlier commented Nov 12, 2024

Tracking issue

#5318

Why are the changes needed?

Context:

When users use data classes (dataclass), lists of data classes (list[dataclass]), or maps of data classes (map[key]dataclass) as workflow default input, they might encounter issues when upgrading or downgrading between flytekit versions 1.13.0 and 1.14.0, especially when running workflows remotely using pyflyte run --remote.

Use Cases:

~~1. Upgrade Scenario:

  • A user registers a workflow with default inputs using flytekit 1.13.0.
  • They upgrade to flytekit 1.14.0.
  • They execute pyflyte run --remote file.py.~~
  1. Downgrade Scenario:
    • A user registers a workflow with default inputs using flytekit 1.14.0.
    • They downgrade to flytekit 1.13.0.
    • They execute pyflyte run --remote file.py.

Problem:

  • In the first scenario, the default input is stored as a protobuf struct, but when running pyflyte run --remote, it uses msgpack IDL to construct the default input.

  • In the second scenario, the default input is stored as msgpack IDL, but pyflyte run --remote uses a protobuf struct to construct the default input.

  • This mismatch leads to the following error:

    Launch plan with different structure already exists. (Please register a new version of the launch plan).
    

What changes were proposed in this pull request?

Root Cause:

  • The LaunchPlanSpec is hashed to determine if the remotely run workflow matches the registered one.
  • LaunchPlanSpec.DefaultInputs.ParameterMap contains either a msgpack IDL or a protobuf struct, depending on the flytekit version used.
  • The difference in serialization formats causes the hash comparison to fail, even if the underlying data is the same.

Solution:

  • Convert both the msgpack IDL and the protobuf struct representations to a common map[string]interface{} in Go, and use reflect.deepEqual to compare them.
  • With this change, the system recognizes that the workflows are the same, preventing the error and allowing users to upgrade or downgrade flytekit versions seamlessly.

Follow-Up

This Pull Request (PR) does not address cases where we use list[DC] or dict[str, DC] as default inputs.
There's a known bug in Flytekit related to these cases, which is documented in Flyte issue #5318.

We need to resolve this Flytekit issue first before we can handle list[DC] or dict[str, DC] default inputs in this PR.

How was this patch tested?

Step Command Screenshot
1. Register a workflow pyflyte register /Users/future-outlier/code/dev/flytekit/build/current_PR/priority_msgpack/flytefile_upload/default_input_upload_2.py image
2. Use the new dc format remotely pyflyte run --remote /Users/future-outlier/code/dev/flytekit/build/current_PR/priority_msgpack/flytefile_upload/default_input_upload_2.py wf image
3. Use the old dc format remotely FLYTE_USE_OLD_DC_FORMAT=true pyflyte run --remote /Users/future-outlier/code/dev/flytekit/build/current_PR/priority_msgpack/flytefile_upload/default_input_upload_2.py wf image

Example code

import os
from dataclasses import dataclass
from flytekit.types.file import FlyteFile
from flytekit.types.structured import StructuredDataset
from flytekit.types.directory import FlyteDirectory
from flytekit import task, workflow, ImageSpec
import pandas as pd

flytekit_hash = "bfe32b52a6fa4b80ab6876b60edd8cbda1636c07"
flytekit = f"git+https://github.com/flyteorg/flytekit.git@{flytekit_hash}"
image = ImageSpec(
    packages=[flytekit, "pandas", "pyarrow"],
    apt_packages=["git"],
    registry="localhost:30000",
)

image = "localhost:30000/flytekit:dev"

@dataclass
class DC:
    ff: FlyteFile
    sd: StructuredDataset
    fd: FlyteDirectory

@task(container_image=image)
def t1(dc: DC = DC(ff=FlyteFile(os.path.realpath(__file__)),
                   sd=StructuredDataset(uri="/Users/future-outlier/code/dev/flytekit/build/debugyt/user/FE/src/data/df.parquet",
                                        file_format="parquet"),
                   fd=FlyteDirectory("/Users/future-outlier/code/dev/flytekit/build/debugyt/user/FE/src/data/"),
                  )) -> DC:
    with open(dc.ff, "r") as f:
        print("File Content: ", f.read())

    print("sd:", dc.sd.open(pd.DataFrame).all())

    df_path = os.path.join(dc.fd.path, "df.parquet")
    print("fd: ", os.path.isdir(df_path))

    return dc

@workflow
def wf(dc: DC = DC(ff=FlyteFile(os.path.realpath(__file__)),
                   sd=StructuredDataset(uri="/Users/future-outlier/code/dev/flytekit/build/debugyt/user/FE/src/data/df.parquet",
                                        file_format="parquet"),
                   fd=FlyteDirectory("/Users/future-outlier/code/dev/flytekit/build/debugyt/user/FE/src/data/"),
                  )):
    t1(dc=dc)

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

flyteorg/flytekit#2907

Docs link

Copy link

codecov bot commented Nov 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 33.15%. Comparing base (fa01e76) to head (05fd0d5).
Report is 10 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5991      +/-   ##
==========================================
- Coverage   36.94%   33.15%   -3.79%     
==========================================
  Files        1310     1013     -297     
  Lines      131382   107571   -23811     
==========================================
- Hits        48540    35667   -12873     
+ Misses      78629    68744    -9885     
+ Partials     4213     3160    -1053     
Flag Coverage Δ
unittests-datacatalog 51.58% <ø> (ø)
unittests-flyteadmin ?
unittests-flytecopilot 22.23% <ø> (ø)
unittests-flytectl 62.39% <ø> (ø)
unittests-flyteidl 6.95% <ø> (+0.03%) ⬆️
unittests-flyteplugins 53.83% <ø> (-0.02%) ⬇️
unittests-flytepropeller 43.10% <ø> (+0.02%) ⬆️
unittests-flytestdlib 55.31% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wild-endeavor
Copy link
Contributor

why are we doing this again? this pr doesn't really make sense to me.

Signed-off-by: Future-Outlier <[email protected]>
@Future-Outlier
Copy link
Member Author

Future-Outlier commented Nov 13, 2024

why are we doing this again? this pr doesn't really make sense to me.

Hi, do you mind check the description again?

Comment on lines 187 to 190
func IsSameDCFormat(oldSpec *admin.LaunchPlanSpec, newSpec *admin.LaunchPlanSpec) bool {
oldParams := oldSpec.GetDefaultInputs().GetParameters()
newParams := newSpec.GetDefaultInputs().GetParameters()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am writing more comments here to let people understand how this algorithm works.

Signed-off-by: Future-Outlier <[email protected]>
@Future-Outlier Future-Outlier changed the title [flyteadmin] Make New DC and Old DC Default Inputs Compatible [WIP][flyteadmin] Make New DC and Old DC Default Inputs Compatible Nov 13, 2024
@Future-Outlier
Copy link
Member Author

why are we doing this again? this pr doesn't really make sense to me.

Hi, @wild-endeavor
@pingsutw @eapolinario and I think this should be accepcted, so the user experience is better.

Signed-off-by: Future-Outlier <[email protected]>
@Future-Outlier
Copy link
Member Author

this should be close and if we want to support this, we should give the different launch plan a new workflow version, since the default inputs are different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants