Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Flyte Types Upload Issues in Default Input #2907

Merged
merged 24 commits into from
Nov 11, 2024
Merged

Conversation

Future-Outlier
Copy link
Member

@Future-Outlier Future-Outlier commented Nov 6, 2024

Tracking issue

flyteorg/flyte#5318

Why are the changes needed?

Before:
if we use flyte types in a dataclass as a workflow or task default inputs, it will fail to upload to remote (s3) storage.
After:
It should work.

What changes were proposed in this pull request?

push a new context (with remote access=True) to context stack in flytekit, so that when flyte types try to upload data, it will upload to remote.

Note:

  1. for flytedirectory it has bug and I'll fix it in the next PR, with thee bug default input + different structure exist.
  2. flyteschema will not work, but this is deprecated and low priority, will create an issue to let other solvei it.

How was this patch tested?

remote execution and integration test.

"""
add context where Kevin and Yee found and add unit tests
fix the filepathtransofrmer to literal expected literal type's type hint
"""
import typing
import os
from dataclasses import dataclass, fields, field
from typing import Dict, List
from flytekit.types.file import FlyteFile
from flytekit.types.structured import StructuredDataset
from flytekit.types.directory import FlyteDirectory
from flytekit import task, workflow, ImageSpec
import datetime
from enum import Enum
import pandas as pd

flytekit_hash = "6a8f45cf30531d39633abce95def22327deaec4f"
flytekit = f"git+https://github.com/flyteorg/flytekit.git@{flytekit_hash}"
image = ImageSpec(
    packages=[flytekit, "pandas", "pyarrow"],
    apt_packages=["git"],
    registry="localhost:30000",
)

@dataclass
class DC:
    ff: FlyteFile
    sd: StructuredDataset

# /Users/future-outlier/code/dev/flytekit/build/debugyt/user/FE/src/data/df.parquet
# StructuredDataset(uri="/Users/future-outlier/code/dev/flytekit/build/debugyt/user/FE/src/data/df.parquet", file_format="parquet")

@task(container_image=image)
def t1(dc: DC = DC(ff=FlyteFile(os.path.realpath(__file__)),
                    sd=StructuredDataset(uri="/Users/future-outlier/code/dev/flytekit/build/debugyt/user/FE/src/data/df.parquet", file_format="parquet"))) -> DC:
    with open(dc.ff, "r") as f:
        print("File Content: ", f.read())
    print("sd:", dc.sd.open(pd.DataFrame).all())
    return dc

@workflow
def wf(dc: DC = DC(ff=FlyteFile(os.path.realpath(__file__)),
                    sd=StructuredDataset(uri="/Users/future-outlier/code/dev/flytekit/build/debugyt/user/FE/src/data/df.parquet", file_format="parquet"))):
    t1(dc=dc)

if __name__ == "__main__":
    from flytekit.clis.sdk_in_container import pyflyte
    from click.testing import CliRunner
    import os
    # print(os.path.realpath(__file__))
    # print(os.getcwd())

    runner = CliRunner()
    path = os.path.realpath(__file__)
    #
    result = runner.invoke(pyflyte.main,
                           ["run", path, "wf"])
    print("Local Execution: ", result.output)
    #
    result = runner.invoke(pyflyte.main,
                           ["run", "--remote", path, "wf"])
    print("Remote Execution: ", result.output)
    # #
    result = runner.invoke(pyflyte.main,
                           ["run", "--remote", path, "t1"])
    print("Remote Execution: ", result.output)

Setup process

Screenshots

image

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

@Future-Outlier Future-Outlier changed the title Fix Flyte Types Upload Issues in Default Input [WIP] Fix Flyte Types Upload Issues in Default Input Nov 6, 2024
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
@Future-Outlier Future-Outlier changed the title [WIP] Fix Flyte Types Upload Issues in Default Input Fix Flyte Types Upload Issues in Default Input Nov 6, 2024
Future-Outlier and others added 7 commits November 6, 2024 23:38
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>

Co-authored-by: pingsutw  <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
@Future-Outlier Future-Outlier enabled auto-merge (squash) November 6, 2024 16:30
Copy link
Member

@pingsutw pingsutw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

integration tests are failing

Copy link

codecov bot commented Nov 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.92%. Comparing base (2fbdc63) to head (1da5f31).
Report is 7 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2907      +/-   ##
==========================================
+ Coverage   72.80%   74.92%   +2.11%     
==========================================
  Files         199      199              
  Lines       20820    20774      -46     
  Branches     2676     2680       +4     
==========================================
+ Hits        15159    15565     +406     
+ Misses       4932     4446     -486     
- Partials      729      763      +34     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Copy link
Member Author

@Future-Outlier Future-Outlier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need 2 files in the test folder?

if we use pyflyte register to the workflow file and use pyflyte run --remote, it will trigger plan with different structure already exists.
Since the Protobuf struct one is MAP in golang, and msgpack IDL one is Bytes in golang.

Should we change backend code in flyte or make thie behavior the same now?

why we should?
for better user experience

Reproduce the error

  1. pyflyte register file.py
  2. pyflyte run --remote file.py wf
  3. FLYTE_USE_OLD_DC_FORMAT=true pyflyte run --remote file.py wf

cc @eapolinario
Let's chat about this in the next 10 hours

Signed-off-by: Future-Outlier <[email protected]>
@Future-Outlier Future-Outlier merged commit 8fdd0c6 into master Nov 11, 2024
104 checks passed
katrogan pushed a commit that referenced this pull request Nov 15, 2024
* Fix Flyte Types Upload Issues in Default Input

Signed-off-by: Future-Outlier <[email protected]>

* TODO: ADD SD CASES, and flyteschema cases and run it in remote

Signed-off-by: Future-Outlier <[email protected]>

* nit

Signed-off-by: Future-Outlier <[email protected]>

* nit

Signed-off-by: Future-Outlier <[email protected]>

* update

Signed-off-by: Future-Outlier <[email protected]>

* update

Signed-off-by: Future-Outlier <[email protected]>

* update kevin's advice

Signed-off-by: Future-Outlier <[email protected]>

Co-authored-by: pingsutw  <[email protected]>

* lint

Signed-off-by: Future-Outlier <[email protected]>

* lint

Signed-off-by: Future-Outlier <[email protected]>

* add test_flytetypes

Signed-off-by: Future-Outlier <[email protected]>

* better-api

Signed-off-by: Future-Outlier <[email protected]>

---------

Signed-off-by: Future-Outlier <[email protected]>
Co-authored-by: pingsutw <[email protected]>
Signed-off-by: Katrina Rogan <[email protected]>
400Ping pushed a commit to 400Ping/flytekit that referenced this pull request Nov 22, 2024
* Fix Flyte Types Upload Issues in Default Input

Signed-off-by: Future-Outlier <[email protected]>

* TODO: ADD SD CASES, and flyteschema cases and run it in remote

Signed-off-by: Future-Outlier <[email protected]>

* nit

Signed-off-by: Future-Outlier <[email protected]>

* nit

Signed-off-by: Future-Outlier <[email protected]>

* update

Signed-off-by: Future-Outlier <[email protected]>

* update

Signed-off-by: Future-Outlier <[email protected]>

* update kevin's advice

Signed-off-by: Future-Outlier <[email protected]>

Co-authored-by: pingsutw  <[email protected]>

* lint

Signed-off-by: Future-Outlier <[email protected]>

* lint

Signed-off-by: Future-Outlier <[email protected]>

* add test_flytetypes

Signed-off-by: Future-Outlier <[email protected]>

* better-api

Signed-off-by: Future-Outlier <[email protected]>

---------

Signed-off-by: Future-Outlier <[email protected]>
Co-authored-by: pingsutw <[email protected]>
Signed-off-by: 400Ping <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants