Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeTransformer for reading and writing from TensorFlowRecord format #1240

Merged
merged 36 commits into from
Dec 6, 2022

Conversation

ryankarlos
Copy link
Contributor

@ryankarlos ryankarlos commented Oct 18, 2022

TL;DR

This flyte feature adds support for users to read and write from .tfrecord file formats
using Tensorflow Example as a native type.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

  • Adds a TensorflowExampleTransformer type in flytekit/extras/tensorflow/records.py which uses the [tf.train.Example] (https://www.tensorflow.org/api_docs/python/tf/train/Example) message, and then serialize, write, and read tf.train.Example messages to and from .tfrecord files, following the examples in the Tensorflow docs https://www.tensorflow.org/tutorials/load_data/tfrecord
  • Adds tests for serialisation and deserialisation steps in Transformer tests/flytekit/unit/extras/tensorflow/test_transformations.py
  • Adds test for example workflow using tf.train.Example message.

Tracking Issue

flyteorg/flyte#2571

@ryankarlos
Copy link
Contributor Author

I havent added this as a plugin since the original issue description was to add this feature similar to format of pytorch transformer type

@ryankarlos ryankarlos force-pushed the typetransformer_tf_model branch 3 times, most recently from 566b3c9 to 8918108 Compare October 18, 2022 18:28
@ryankarlos ryankarlos changed the title Type Transformer for reading and writing from TensorFlowRecord format TypeTransformer for reading and writing from TensorFlowRecord format Oct 18, 2022
@ryankarlos ryankarlos force-pushed the typetransformer_tf_model branch from 8918108 to 12dff47 Compare October 19, 2022 23:28
@dennisobrien
Copy link
Contributor

The unit test failures seem to be caused by tensorflow not being included.

E ModuleNotFoundError: No module named 'tensorflow'

You should be able to add this to dev-requirements.in.

I'm excited to see more tensorflow support being contributed!

@ryankarlos ryankarlos force-pushed the typetransformer_tf_model branch from 49bcbe5 to 9342db3 Compare October 20, 2022 17:35
@@ -13,3 +13,4 @@ google-cloud-bigquery
google-cloud-bigquery-storage
IPython
torch
tensorflow<=2.8.1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ive had to pin the versions of tensorflow here as the latter ones would throw a error when trying to resolve protobuf dependencies - i guess due to pinned protobuf version in requirements.txt (also i rather not make any changes to requirements.in to resolve this)

Screenshot 2022-10-20 at 18 36 35

Copy link
Contributor Author

@ryankarlos ryankarlos Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore this - ive pinned grpcio-status<1.49.0 instead based on suggestion from @pingsutw in another PR, which fixed it !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recently created this PR #1248 that adds version constraints to grpcio and grpcio-status in requirements.in. You should be able to pull that change in now that it has been merged and avoid the constraint in dev-requirements.in.

Bug description here: flyteorg/flyte#3006

Copy link
Contributor Author

@ryankarlos ryankarlos Oct 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still get the same issue as above if pulling in from master . grpcio and grpcio-status versions in requirements.in are:

grpcio<=1.47.0
grpcio-status<=1.47.0

I think the it may need to be pinned grpcio-status<1.49.0 as @pingsutw had suggested (at least that was working for me) but not sure.

@ryankarlos
Copy link
Contributor Author

The unit test failures seem to be caused by tensorflow not being included.

E ModuleNotFoundError: No module named 'tensorflow'

You should be able to add this to dev-requirements.in.

I'm excited to see more tensorflow support being contributed!

@dennisobrien thanks, i pushed the changes now. Ive also created a PR #1242 for keras model support !

@ryankarlos ryankarlos mentioned this pull request Oct 20, 2022
8 tasks
@codecov
Copy link

codecov bot commented Oct 20, 2022

Codecov Report

Merging #1240 (23c8bea) into master (f616cd4) will increase coverage by 0.24%.
The diff coverage is 73.09%.

@@            Coverage Diff             @@
##           master    #1240      +/-   ##
==========================================
+ Coverage   68.83%   69.08%   +0.24%     
==========================================
  Files         291      295       +4     
  Lines       26683    26922     +239     
  Branches     2140     2531     +391     
==========================================
+ Hits        18368    18598     +230     
- Misses       7817     7829      +12     
+ Partials      498      495       -3     
Impacted Files Coverage Δ
flytekit/extras/tensorflow/__init__.py 0.00% <0.00%> (ø)
flytekit/types/directory/__init__.py 0.00% <0.00%> (ø)
flytekit/types/file/__init__.py 17.07% <0.00%> (-0.88%) ⬇️
flytekit/extras/tensorflow/record.py 47.12% <47.12%> (ø)
...tekit/unit/extras/tensorflow/record/test_record.py 100.00% <100.00%> (ø)
...t/extras/tensorflow/record/test_transformations.py 100.00% <100.00%> (ø)
flytekit/interfaces/random.py 20.00% <0.00%> (-5.00%) ⬇️
flytekit/configuration/internal.py 16.43% <0.00%> (-2.03%) ⬇️
flytekit/types/directory/types.py 55.73% <0.00%> (-0.47%) ⬇️
flytekit/types/file/file.py 60.00% <0.00%> (-0.42%) ⬇️
... and 9 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

dev-requirements.txt Outdated Show resolved Hide resolved
flytekit/extras/tensorflow/records.py Outdated Show resolved Hide resolved
@ryankarlos ryankarlos force-pushed the typetransformer_tf_model branch 2 times, most recently from 52625f6 to f6fc331 Compare October 24, 2022 11:27
@ryankarlos
Copy link
Contributor Author

@pingsutw pushed requested changes

@ryankarlos ryankarlos requested a review from pingsutw October 24, 2022 14:35
@cosmicBboy
Copy link
Contributor

cosmicBboy commented Oct 26, 2022

Writing feedback here for posterity.

Draft Proposal

  1. Create a TFRecordFile type that extends FlyteFile to include an additional record type FlyteFile["tfrecord"] for serializing/deserializing tfrecords, which handles tf.train.Example task outputs automatically.
  2. Extend FlyteDirectory to TFRecordsDirectory, which automatically handles List[tf.data.Example] outputs by serializing them as TFRecords and stores it as a multi-part blob.

Why not just a type transformer for tf.train.Example?

Because when we create integrations to other frameworks/libraries, we should facilitate serialization to recommended, stable file formats and deserialize to Python objects that:

  1. are most useful to the users of the framework (in this case Tensorflow)
  2. conforms to practical usage patterns.

Since tf.train.Example is a protobuf message that can't actually be used for model training and needs to be converted into a TFRecord (which is subsequently loaded into a tf.data.Dataset by the user), supporting tf.train.Example as a type transformer may lead to confusion, whereas a TFRecordFile that automatically handles tf.train.Example outputs (and of course can handle filepaths like regular FlyteFile types) is clearer in intent:

@task
def produce_record(...) -> TFRecordFile:
    return tf.train.Example(...)

Furthermore, the key assumption in this proposal is that not many people actually output a single tf.train.Example in a task, but rather a collection of Examples.

@task
def produce_records(...) -> TFRecordsDirectory:
    return [tf.train.Example(...) for _ in range(100)]

Here, TFRecordsDirectory would automatically serialize the list of Examples into a FlyteDirectory of TFRecords, which can then be passed to a downstream task:

@task
def consume_records(tf_records: TFRecordsDirectory):
    return tf.data.TFRecordDataset(os.listdir(tf_records), ...)

Questions

  • Do we need a type to handle a single tf.train.Example? I'd say no 🙃 but happy to discuss more
  • Do we actually need TFRecordFile to serialize single records as outputs to tasks?
  • Do we need a type transformer for tf.data.Dataset? How much value would something like this provide?
@task
def produce_records(...) -> TFRecordsDirectory:
    return [tf.train.Example(...) for _ in range(100)]

@task
def consume_records(
    dataset: Annotated[
        tf.data.TFRecordDataset,
        # configure kwargs to the constructor
        # https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset
        TFRecordDatasetConfig(...)
    ]
):
    ... # use the dataset directly

@samhita-alla
Copy link
Contributor

@cosmicBboy, thanks for writing this up! I like the idea behind TFRecordFile and TFRecordsDirectory. The directory format might be more useful, but I think we also need to support storing a single tf.Train.Example or tf.data.Dataset.

Concerning your questions:

  • I agree; we don't need a tf.Train.Example TypeTransformer
  • I think so
  • Am I right in assuming that dataset here corresponds to TFRecordFile or TFRecordsDirectory? If so, besides kwargs, we might also need to let users call methods, e.g., see how get_dataset fetches the data from a TFRecordDataset. But I don't think it's possible to streamline this into a type; so a better alternative will be to enable users to provide kwargs and let them apply additional methods or parsers if needed within a task, and I think this could facilitate extraction of the data from a TFRecordDataset to some extent.

As for the code structure, will this go into flytekit/extras directory?

@cosmicBboy
Copy link
Contributor

But I don't think it's possible to streamline this into a type; so a better alternative will be to enable users to provide kwargs and let them apply additional methods or parsers if needed within a task

Right, I'm thinking for the tf.data.TFRecordDataset annotated type, we'd just handle the initialization of the object tf.data.TFRecordDataset(filenames, **kwargs) and then pass that into the task, the user is responsible for other transformations in the function body:

@task
def consume_records(
    dataset: Annotated[
        tf.data.TFRecordDataset,
        # configure kwargs to the constructor
        # https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset
        TFRecordDatasetConfig(...)
    ]
):
    dataset = (
        dataset
        .map(parse_tfrecord_fn, num_parallel_calls=AUTOTUNE)
        .map(prepare_sample, num_parallel_calls=AUTOTUNE)
        .shuffle(batch_size * 10)
        .batch(batch_size)
        .prefetch(AUTOTUNE)
    )

What do you think? If this looks good I can update the proposal

As for the code structure, will this go into flytekit/extras directory?

Yep! As long as we follow the same conventions as the pytorch extra I think we should make this part of the main flytekit api.

@samhita-alla
Copy link
Contributor

@cosmicBboy looks good to me! @ryankarlos please read through the comments.

Signed-off-by: Ryan Nazareth <[email protected]>
Signed-off-by: Ryan Nazareth <[email protected]>
Signed-off-by: Ryan Nazareth <[email protected]>
@ryankarlos
Copy link
Contributor Author

Test failures on CI are unrelated to tests in this PR

Screenshot 2022-11-29 at 01 47 04

@samhita-alla
Copy link
Contributor

Can you import Annotated from typing_extensions? That should fix the failures.

from flytekit.types.directory import TFRecordsDirectory
from flytekit.types.file import TFRecordFile

T = TypeVar("T")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using this anywhere?

return uri, metadata


def to_tf_record_dataset_from_dir(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge this and the file functions with the to_python_val methods? Seems like these functions aren't being re-used anywhere else, right? So I think it's okay to have the code within the transformers.

Signed-off-by: Ryan Nazareth <[email protected]>
@ryankarlos ryankarlos requested review from samhita-alla and removed request for pingsutw December 1, 2022 15:14
Comment on lines 175 to 176
files = os.scandir(uri)
filenames = [os.path.join(local_dir, f.name) for f in files]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to get the file names from the local directory, not the remote path. In local case, it works, but when run on Flyte backend, it'll be a remote uri.

@samhita-alla
Copy link
Contributor

Amazing work, @ryankarlos! A few more comments. Sorry about incrementally reviewing the PR. :/

@ryankarlos
Copy link
Contributor Author

Amazing work, @ryankarlos! A few more comments. Sorry about incrementally reviewing the PR. :/

Thank you ! No thats fine, you have spotted a lot of my errors which is good !

@pingsutw pingsutw merged commit 467a137 into flyteorg:master Dec 6, 2022
eapolinario pushed a commit that referenced this pull request Feb 22, 2023
…1240)

* first commit

Signed-off-by: Ryan Nazareth <[email protected]>

* add tensorflow example tf record transformer

Signed-off-by: Ryan Nazareth <[email protected]>

* refactor

Signed-off-by: Ryan Nazareth <[email protected]>

* correct tfexample description

Signed-off-by: Ryan Nazareth <[email protected]>

* fix test_native.py

Signed-off-by: Ryan Nazareth <[email protected]>

* add tensorflow docs and reqs

Signed-off-by: Ryan Nazareth <[email protected]>

* add tensorflow docs and reqs1

Signed-off-by: Ryan Nazareth <[email protected]>

* tensorflow import in init

Signed-off-by: Ryan Nazareth <[email protected]>

* fix failing tests

Signed-off-by: Ryan Nazareth <[email protected]>

* add tensorflow pinned version to reqs

Signed-off-by: Ryan Nazareth <[email protected]>

* pin grpcio-status to remove protobuf error

Signed-off-by: Ryan Nazareth <[email protected]>

* add suggested changes

Signed-off-by: Ryan Nazareth <[email protected]>

* redesign transformer

Signed-off-by: Ryan Nazareth <[email protected]>

* remove old script

Signed-off-by: Ryan Nazareth <[email protected]>

* fix type reference for TFREcordDataset

Signed-off-by: Ryan Nazareth <[email protected]>

* refactor

Signed-off-by: Ryan Nazareth <[email protected]>

* refactor

Signed-off-by: Ryan Nazareth <[email protected]>

* spacing and uppercase

Signed-off-by: Ryan Nazareth <[email protected]>

* redesign with tfdir and tfrecordfile subclass

Signed-off-by: Ryan Nazareth <[email protected]>

* fix conflicts and typos

Signed-off-by: Ryan Nazareth <[email protected]>

* address majority of comments

Signed-off-by: Ryan Nazareth <[email protected]>

* refactor

Signed-off-by: Ryan Nazareth <[email protected]>

* fix test with flytefile and metadata annotated

Signed-off-by: Ryan Nazareth <[email protected]>

* fix check for example records in directory

Signed-off-by: Ryan Nazareth <[email protected]>

* refactor and correct typing

Signed-off-by: Ryan Nazareth <[email protected]>

* lint

Signed-off-by: Ryan Nazareth <[email protected]>

* import annotated from typing_extensions

Signed-off-by: Ryan Nazareth <[email protected]>

* tweak to tests to test case when Config not passed in as type

Signed-off-by: Ryan Nazareth <[email protected]>

* add suggested changes

Signed-off-by: Ryan Nazareth <[email protected]>

* add task for tfrecord dir with no config in test

Signed-off-by: Ryan Nazareth <[email protected]>

* get filenames from local dir instead of remote

Signed-off-by: Ryan Nazareth <[email protected]>

Signed-off-by: Ryan Nazareth <[email protected]>
eapolinario added a commit that referenced this pull request Feb 23, 2023
* Force flyteidl==1.2.9

Signed-off-by: Eduardo Apolinario <[email protected]>

* Sanitize query template input in sqlite task (#1359)

Signed-off-by: Eduardo Apolinario <[email protected]>

Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>

* TypeTransformer for reading and writing from TensorFlowRecord format (#1240)

* first commit

Signed-off-by: Ryan Nazareth <[email protected]>

* add tensorflow example tf record transformer

Signed-off-by: Ryan Nazareth <[email protected]>

* refactor

Signed-off-by: Ryan Nazareth <[email protected]>

* correct tfexample description

Signed-off-by: Ryan Nazareth <[email protected]>

* fix test_native.py

Signed-off-by: Ryan Nazareth <[email protected]>

* add tensorflow docs and reqs

Signed-off-by: Ryan Nazareth <[email protected]>

* add tensorflow docs and reqs1

Signed-off-by: Ryan Nazareth <[email protected]>

* tensorflow import in init

Signed-off-by: Ryan Nazareth <[email protected]>

* fix failing tests

Signed-off-by: Ryan Nazareth <[email protected]>

* add tensorflow pinned version to reqs

Signed-off-by: Ryan Nazareth <[email protected]>

* pin grpcio-status to remove protobuf error

Signed-off-by: Ryan Nazareth <[email protected]>

* add suggested changes

Signed-off-by: Ryan Nazareth <[email protected]>

* redesign transformer

Signed-off-by: Ryan Nazareth <[email protected]>

* remove old script

Signed-off-by: Ryan Nazareth <[email protected]>

* fix type reference for TFREcordDataset

Signed-off-by: Ryan Nazareth <[email protected]>

* refactor

Signed-off-by: Ryan Nazareth <[email protected]>

* refactor

Signed-off-by: Ryan Nazareth <[email protected]>

* spacing and uppercase

Signed-off-by: Ryan Nazareth <[email protected]>

* redesign with tfdir and tfrecordfile subclass

Signed-off-by: Ryan Nazareth <[email protected]>

* fix conflicts and typos

Signed-off-by: Ryan Nazareth <[email protected]>

* address majority of comments

Signed-off-by: Ryan Nazareth <[email protected]>

* refactor

Signed-off-by: Ryan Nazareth <[email protected]>

* fix test with flytefile and metadata annotated

Signed-off-by: Ryan Nazareth <[email protected]>

* fix check for example records in directory

Signed-off-by: Ryan Nazareth <[email protected]>

* refactor and correct typing

Signed-off-by: Ryan Nazareth <[email protected]>

* lint

Signed-off-by: Ryan Nazareth <[email protected]>

* import annotated from typing_extensions

Signed-off-by: Ryan Nazareth <[email protected]>

* tweak to tests to test case when Config not passed in as type

Signed-off-by: Ryan Nazareth <[email protected]>

* add suggested changes

Signed-off-by: Ryan Nazareth <[email protected]>

* add task for tfrecord dir with no config in test

Signed-off-by: Ryan Nazareth <[email protected]>

* get filenames from local dir instead of remote

Signed-off-by: Ryan Nazareth <[email protected]>

Signed-off-by: Ryan Nazareth <[email protected]>

* update ray plugin dependency (#1361)

Signed-off-by: Kevin Su <[email protected]>

Signed-off-by: Kevin Su <[email protected]>

* Set default format of structured dataset to empty (#1159)

* Set default format of structured dataset to empty

Signed-off-by: Kevin Su <[email protected]>

* Fix tests

Signed-off-by: Kevin Su <[email protected]>

* Fix tests

Signed-off-by: Kevin Su <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

* last error (#1364)

Signed-off-by: Yee Hing Tong <[email protected]>

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Co-authored-by: Yee Hing Tong <[email protected]>

* Adds CLI reference for pyflyte (#1362)

* Adds pyflyte CLI reference guide

Signed-off-by: Samhita Alla <[email protected]>

* bump python version

Signed-off-by: Samhita Alla <[email protected]>

* bump python version

Signed-off-by: Samhita Alla <[email protected]>

* resolve docs error

Signed-off-by: Samhita Alla <[email protected]>

* set nested to none

Signed-off-by: Samhita Alla <[email protected]>

* remove flyteidl version constraint

Signed-off-by: Samhita Alla <[email protected]>

* update requirements

Signed-off-by: Samhita Alla <[email protected]>

Signed-off-by: Samhita Alla <[email protected]>

* Signaling (#1133)

Signed-off-by: Yee Hing Tong <[email protected]>

* Adding created and updated at to ExecutionClosure model (#1371)

Signed-off-by: Yee Hing Tong <[email protected]>

* Add Databricks config to Spark Job (#1358)

Signed-off-by: Kevin Su <[email protected]>

* Add overwrite_cache option the to calls of remote and local executions (#1375)

Signed-off-by: H. Furkan Vural <[email protected]>

Implemented cache overwrite feature is added on flytekit as well for the completeness. In order to support the cache eviction RFC, an overwrite parameter was added, indicating the data store should replace an existing artifact instead of creating a new one on local calls.

* Remove project/domain from being overridden with execution values in serialized context (#1378)

Signed-off-by: Yee Hing Tong <[email protected]>

* Use TaskSpec instead of TaskTemplate for fetch_task and avoid network when loading module (#1348)

Signed-off-by: Ketan Umare <[email protected]>

* Register Databricks config (#1379)

* Register databricks plugin

Signed-off-by: Kevin Su <[email protected]>

* Update databricks plugin

Signed-off-by: Kevin Su <[email protected]>

* register databricks

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

Signed-off-by: Kevin Su <[email protected]>
Co-authored-by: Yee Hing Tong <[email protected]>

* PodSpec should not require primary_container name (#1380)

For Pod tasks, if the primary_container_name is not specified, it should default.

Signed-off-by: Ketan Umare <[email protected]>

* fix(pyflyte): change -d to -D for --destination-dir as -d is already for --domain (#1381)

Co-authored-by: Eduardo Apolinario <[email protected]>

* Handle Optional[FlyteFile] in Dataclass type transformer (#1393)

* Add support for Optional to dataclass transformer

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add one more test

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add one more test

Signed-off-by: Eduardo Apolinario <[email protected]>

* Fix serialization of optional flyte types

Signed-off-by: Eduardo Apolinario <[email protected]>

Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>

* add FastSerializationSettings to docs (#1386)

Signed-off-by: Niels Bantilan <[email protected]>

Signed-off-by: Niels Bantilan <[email protected]>
Co-authored-by: Kevin Su <[email protected]>

* Added more pod tests and an example pod task (#1382)

* Added more pod tests and an example pod task

Signed-off-by: Ketan Umare <[email protected]>

* fixing test and name

Signed-off-by: Ketan Umare <[email protected]>

* updated

Signed-off-by: Ketan Umare <[email protected]>

Signed-off-by: Ketan Umare <[email protected]>

* Convert default dict to json string in pyflyte run (#1399)

Signed-off-by: Kevin Su <[email protected]>

Signed-off-by: Kevin Su <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>

* docs: update register help, non-fast version is supported (#1402)

Signed-off-by: Patrick Brogan <[email protected]>

* Update log level for structured dataset (#1394)

Signed-off-by: Kevin Su <[email protected]>

* Add Niels to code owners (#1404)

Signed-off-by: Kevin Su <[email protected]>

* Signal use (#1398)

Signed-off-by: Yee Hing Tong <[email protected]>

* User Documentation Proposal (#1200)

Signed-off-by: Kevin Su <[email protected]>

* Add support MLFlow plugin (#1274)

* MLFlow plugin in progress

Signed-off-by: Ketan Umare <[email protected]>

* wip

Signed-off-by: Kevin Su <[email protected]>

* wip

Signed-off-by: Kevin Su <[email protected]>

* update test

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* update readme

Signed-off-by: Kevin Su <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

* wip

Signed-off-by: Kevin Su <[email protected]>

* dwip

Signed-off-by: Kevin Su <[email protected]>

* wip

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* change experiment name

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* Add mlflow to index.rst

Signed-off-by: Kevin Su <[email protected]>

* use experiment name that user provided

Signed-off-by: Kevin Su <[email protected]>

* update doc-requirements.txt

Signed-off-by: Kevin Su <[email protected]>

* Add backend plugin deployment

Signed-off-by: Kevin Su <[email protected]>

* generate doc for method

Signed-off-by: Kevin Su <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

* update docstring

Signed-off-by: Niels Bantilan <[email protected]>

* update docstring

Signed-off-by: Niels Bantilan <[email protected]>

* Update tracking.py

Signed-off-by: Niels Bantilan <[email protected]>

Signed-off-by: Ketan Umare <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Niels Bantilan <[email protected]>
Co-authored-by: Kevin Su <[email protected]>
Co-authored-by: Niels Bantilan <[email protected]>

* fix remote API reference (#1405)

Signed-off-by: Niels Bantilan <[email protected]>

Signed-off-by: Niels Bantilan <[email protected]>

* Read structured dataset from a folder  (#1406)

* Read polars dataframe in a folder

Signed-off-by: Kevin Su <[email protected]>

* Read polars dataframe in a folder

Signed-off-by: Kevin Su <[email protected]>

* Load huggingface and spark plugin implicitly

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* Fix tests

Signed-off-by: Kevin Su <[email protected]>

* remove _pyspark alias

Signed-off-by: Yee Hing Tong <[email protected]>

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Co-authored-by: Yee Hing Tong <[email protected]>

* Update default config to work out-of-the-box with flytectl demo (#1384)

Signed-off-by: Niels Bantilan <[email protected]>

* Add dask plugin #patch (#1366)

* Add dummy task type to test backend plugin

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Add docs page

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Add dask models

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Add function to convert resources

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Add tests to `dask` task

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Remove namespace

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Update setup.py

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Add dask to `plugin/README.md`

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Add README.md for `dask`

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Top level export of `JopPodSpec` and `DaskCluster`

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Update docs for images

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Update README.md

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Update models after `flyteidl` change

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Update task after `flyteidl` change

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Raise error when less than 1 worker

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Update flyteidl to >= 1.3.2

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Update doc requirements

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Update doc-requirements.txt

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Re-lock dependencies on linux

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Update dask API docs

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Fix documentation links

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Default optional model constructor arguments to `None`

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Refactor `convert_resources_to_resource_model` to `core.resources`

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Use `convert_resources_to_resource_model` in `core.node`

Signed-off-by: Bernhard Stadlbauer <[email protected]>

* Incorporate review feedback

Signed-off-by: Eduardo Apolinario <[email protected]>

* Lint

Signed-off-by: Eduardo Apolinario <[email protected]>

Signed-off-by: Bernhard Stadlbauer <[email protected]>
Signed-off-by: Bernhard Stadlbauer <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>

* Add support for overriding task configurations (#1410)

Signed-off-by: Kevin Su <[email protected]>

* Warning if git is not installed (#1414)

* warning if git is not installed

Signed-off-by: Kevin Su <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

Signed-off-by: Kevin Su <[email protected]>

* Flip the settings for channel and logger (#1415)

Signed-off-by: Yee Hing Tong <[email protected]>

* Preserving Exception in the LazyEntity fetch (#1412)

* Preserving Exception in the LazyEntity fetch

Signed-off-by: Ketan Umare <[email protected]>

* updated lint error

Signed-off-by: Ketan Umare <[email protected]>

* more tests

Signed-off-by: Ketan Umare <[email protected]>

Signed-off-by: Ketan Umare <[email protected]>

* [Docs] SynchronousFlyteClient API reference #3095 (#1416)

Signed-off-by: Peeter Piegaze <[email protected]>

Signed-off-by: Peeter Piegaze <[email protected]>
Co-authored-by: Peeter Piegaze <[email protected]>
Co-authored-by: Haytham Abuelfutuh <[email protected]>

* Return error code on fail (#1408)

* AWS batch return error code once it fails

Signed-off-by: Kevin Su <[email protected]>

* AWS batch return error code once it fails

Signed-off-by: Kevin Su <[email protected]>

* update tests

Signed-off-by: Kevin Su <[email protected]>

* Update tests

Signed-off-by: Kevin Su <[email protected]>

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>

* wrapping flyte entity in a task node in call to flyte node constructor, not sure if integration tests are actually running (#1422)

Signed-off-by: Yee Hing Tong <[email protected]>

Signed-off-by: Yee Hing Tong <[email protected]>

* Sqlalchemy multiline query (#1421)

* SQLAlchemyTask should handle multiline strings for query template

Signed-off-by: Niels Bantilan <[email protected]>

* sqlalchemy supports multi-line query

Signed-off-by: Niels Bantilan <[email protected]>

* update base sql task

Signed-off-by: Niels Bantilan <[email protected]>

* remove space

Signed-off-by: Niels Bantilan <[email protected]>

* fix snowflake tests

Signed-off-by: Niels Bantilan <[email protected]>

* fix lint

Signed-off-by: Niels Bantilan <[email protected]>

* fix test

Signed-off-by: Niels Bantilan <[email protected]>

Signed-off-by: Niels Bantilan <[email protected]>

* Sklearn type transformer should be automatically loaded with import flytekit (#1423)

* add flytekit.extras.sklearn to main __init__ import

Signed-off-by: Niels Bantilan <[email protected]>

* fix docs

Signed-off-by: Niels Bantilan <[email protected]>

* add temporary docs/requirements.txt for onnx plugins

Signed-off-by: Niels Bantilan <[email protected]>

---------

Signed-off-by: Niels Bantilan <[email protected]>

* Bump isort to 5.12.0 (#1427)

Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>

* Fixes guess type bug in UnionTransformer (#1426)

Signed-off-by: Ketan Umare <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>

* Add `pod_template` and `pod_template_name` arguments for `PythonAutoContainerTask`, its downstream tasks, and `@task`. (#1425)

* Add `pod_template` and `pod_template_name` arguments for `PythonAutoContainerTask`, its downstream tasks, and `@task`

Signed-off-by: byhsu <[email protected]>

* clean

Signed-off-by: byhsu <[email protected]>

* fix test

Signed-off-by: byhsu <[email protected]>

* Fix taskmetadata

Signed-off-by: byhsu <[email protected]>

* add kubernetes in setup.py

Signed-off-by: byhsu <[email protected]>

* address comments

Signed-off-by: byhsu <[email protected]>

* Regenerate requirements using python 3.7

Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: byhsu <[email protected]>

* keep container validation

Signed-off-by: byhsu <[email protected]>

* bump idl version

Signed-off-by: byhsu <[email protected]>

* Regenerate requirements using python 3.7

Signed-off-by: Eduardo Apolinario <[email protected]>

* Regenerate doc-requirements.txt

Signed-off-by: Eduardo Apolinario <[email protected]>

* fix

Signed-off-by: byhsu <[email protected]>

---------

Signed-off-by: byhsu <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: byhsu <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>

* Auto Backfill workflow (#1420)

* Fix primitive decoder when evaluating Promise (#1432)

Signed-off-by: Samhita Alla <[email protected]>

* set maximum python version to 3.10 (#1433)

* set maximum python version to 3.10

Signed-off-by: Niels Bantilan <[email protected]>

* remove unneeded python version check

Signed-off-by: Niels Bantilan <[email protected]>

* fix lint

Signed-off-by: Niels Bantilan <[email protected]>

---------

Signed-off-by: Niels Bantilan <[email protected]>

* Revert "Remove project/domain from being overridden with execution values in serialized context (#1378)" (#1460)

* Revert "Remove project/domain from being overridden with execution values in serialized context (#1378)"

This reverts commit b3bfef5.

* Import os

Signed-off-by: Eduardo Apolinario <[email protected]>

* Lint

Signed-off-by: Eduardo Apolinario <[email protected]>

---------

Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>

* Support checkpointing in local mode from cached tasks (#1457)

* support checkpointing in local mode from cached tasks

* clear cache before tests

---------

Co-authored-by: Stef Nelson-Lindall <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>

* Deprecate FlyteSchema (#1418)

* Deprecate FlyteSchema

Signed-off-by: Kevin Su <[email protected]>

* Remove version

Signed-off-by: Kevin Su <[email protected]>

---------

Signed-off-by: Kevin Su <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>

* Use scarf images (#1434)

* Use scarf images

Signed-off-by: Eduardo Apolinario <[email protected]>

* Use scarf names in tests.

Signed-off-by: Eduardo Apolinario <[email protected]>

---------

Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>

* add undocumented objects/functions to flytekit api ref (#1502)

* add reference_launch_plan to flytekit api ref

Signed-off-by: Niels Bantilan <[email protected]>

* import in init, add docstrings

Signed-off-by: Niels Bantilan <[email protected]>

* add more to references

Signed-off-by: Niels Bantilan <[email protected]>

* fix lint

Signed-off-by: Niels Bantilan <[email protected]>

* update

Signed-off-by: Niels Bantilan <[email protected]>

* fix up docstrings

Signed-off-by: Niels Bantilan <[email protected]>

---------

Signed-off-by: Niels Bantilan <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Samhita Alla <[email protected]>

* Use non-root user in default flytekit image (#1417)

Signed-off-by: Kevin Su <[email protected]>

* Fix PyTorch transformer (#1510)

Signed-off-by: Samhita Alla <[email protected]>

* Fix mypy errors (#1313)

* wip

Signed-off-by: Kevin Su <[email protected]>

* Fix mypy errors

Signed-off-by: Kevin Su <[email protected]>

* Fix mypy errors

Signed-off-by: Kevin Su <[email protected]>

* Fix tests

Signed-off-by: Kevin Su <[email protected]>

* Fix tests

Signed-off-by: Kevin Su <[email protected]>

* Fix tests

Signed-off-by: Kevin Su <[email protected]>

* wip

Signed-off-by: Kevin Su <[email protected]>

* wip

Signed-off-by: Kevin Su <[email protected]>

* fix tests

Signed-off-by: Kevin Su <[email protected]>

* fix tests

Signed-off-by: Kevin Su <[email protected]>

* fix test

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* Update type

Signed-off-by: Kevin Su <[email protected]>

* Fix tests

Signed-off-by: Kevin Su <[email protected]>

* Fix tests

Signed-off-by: Kevin Su <[email protected]>

* Fix tests

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* update dev-requirements.txt

Signed-off-by: Kevin Su <[email protected]>

* Address comment

Signed-off-by: Kevin Su <[email protected]>

* upgrade torch

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

---------

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Co-authored-by: Yee Hing Tong <[email protected]>

* Compile the workflow only at compile time (#1311)

* wip

Signed-off-by: Kevin Su <[email protected]>

* wip

Signed-off-by: Kevin Su <[email protected]>

* wip

Signed-off-by: Kevin Su <[email protected]>

* wip

Signed-off-by: Kevin Su <[email protected]>

* wip

Signed-off-by: Kevin Su <[email protected]>

* add tests

Signed-off-by: Kevin Su <[email protected]>

* add tests

Signed-off-by: Kevin Su <[email protected]>

* support dynamic task

Signed-off-by: Kevin Su <[email protected]>

* test

Signed-off-by: Kevin Su <[email protected]>

* test

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* lazy compile

Signed-off-by: Kevin Su <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

* add tests

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* test

Signed-off-by: Kevin Su <[email protected]>

* test

Signed-off-by: Kevin Su <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

* test

Signed-off-by: Kevin Su <[email protected]>

* test

Signed-off-by: Kevin Su <[email protected]>

* test

Signed-off-by: Kevin Su <[email protected]>

* test

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* update test

Signed-off-by: Kevin Su <[email protected]>

---------

Signed-off-by: Kevin Su <[email protected]>

* Get the origin type when serializing dataclass (#1508)

* Get the origin type when serializing dataclass

Signed-off-by: Kevin Su <[email protected]>

* test

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* update test

Signed-off-by: Kevin Su <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

---------

Signed-off-by: Kevin Su <[email protected]>
Co-authored-by: Niels Bantilan <[email protected]>

* Fix bad merge

Signed-off-by: Eduardo Apolinario <[email protected]>

* Delay initialization of SynchronousFlyteClient in FlyteRemote (#1514)

* Delay initialization of SynchronousFlyteClient in FlyteRemote

Signed-off-by: Eduardo Apolinario <[email protected]>

* Fix spark plugin flyteremote test.

Signed-off-by: Eduardo Apolinario <[email protected]>

* Lint

Signed-off-by: Eduardo Apolinario <[email protected]>

---------

Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>

* Set flytekit and flyteidl bounds in plugins tests

Signed-off-by: Eduardo Apolinario <[email protected]>

* Revert "Fix mypy errors (#1313)"

This reverts commit 3798450.

Signed-off-by: Eduardo Apolinario <[email protected]>

* Fix requirements in dask and ray plugins

Signed-off-by: Eduardo Apolinario <[email protected]>

* Fix papermill tests requirements

Signed-off-by: Eduardo Apolinario <[email protected]>

* Fix doc-requirements

Signed-off-by: Eduardo Apolinario <[email protected]>

* dask plugin requirements

Signed-off-by: Eduardo Apolinario <[email protected]>

* Revert "Add dask plugin #patch (#1366)"

This reverts commit 41a9c7a.

Signed-off-by: Eduardo Apolinario <[email protected]>

---------

Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Ryan Nazareth <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Ketan Umare <[email protected]>
Signed-off-by: Niels Bantilan <[email protected]>
Signed-off-by: Patrick Brogan <[email protected]>
Signed-off-by: Bernhard Stadlbauer <[email protected]>
Signed-off-by: Bernhard Stadlbauer <[email protected]>
Signed-off-by: Peeter Piegaze <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: byhsu <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Ryan Nazareth <[email protected]>
Co-authored-by: Kevin Su <[email protected]>
Co-authored-by: Yee Hing Tong <[email protected]>
Co-authored-by: Samhita Alla <[email protected]>
Co-authored-by: H. Furkan Vural <[email protected]>
Co-authored-by: Ketan Umare <[email protected]>
Co-authored-by: mcloney-ddm <[email protected]>
Co-authored-by: Niels Bantilan <[email protected]>
Co-authored-by: pbrogan12 <[email protected]>
Co-authored-by: bstadlbauer <[email protected]>
Co-authored-by: Peeter Piegaze <[email protected]>
Co-authored-by: Peeter Piegaze <[email protected]>
Co-authored-by: Haytham Abuelfutuh <[email protected]>
Co-authored-by: ByronHsu <[email protected]>
Co-authored-by: byhsu <[email protected]>
Co-authored-by: Stef Lindall <[email protected]>
Co-authored-by: Stef Nelson-Lindall <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants