Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC into RFC #5653

Merged
merged 648 commits into from
Aug 13, 2024
Merged

RFC into RFC #5653

merged 648 commits into from
Aug 13, 2024

Conversation

katrogan
Copy link
Contributor

What changes were proposed in this pull request?

Updates draft RFC here: #5103

How was this patch tested?

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Tom-Newton and others added 30 commits April 3, 2024 20:06
* Remove shard key in admin-launcher

Signed-off-by: Thomas Newton <[email protected]>

* Don't mutate existing state

Signed-off-by: Thomas Newton <[email protected]>

* Don't mutate state

Signed-off-by: Thomas Newton <[email protected]>

* Add a test

Signed-off-by: Thomas Newton <[email protected]>

---------

Signed-off-by: Thomas Newton <[email protected]>
…4986)

* Add tracking for active node and task execution counts in propeller

Signed-off-by: Shardool <[email protected]>

* Update unit tests for task and node execution counts

Signed-off-by: Shardool <[email protected]>

* Fix linter errors

Signed-off-by: Shardool <[email protected]>

* fix linter errors

Signed-off-by: Paul Dittamo <[email protected]>

---------

Signed-off-by: Shardool <[email protected]>
Signed-off-by: Paul Dittamo <[email protected]>
Co-authored-by: Paul Dittamo <[email protected]>
…ors (#5161)

* include container statuses for all container exit errors

Signed-off-by: Paul Dittamo <[email protected]>

* add unit test

Signed-off-by: Paul Dittamo <[email protected]>

---------

Signed-off-by: Paul Dittamo <[email protected]>
* docs(flyte-core): add missing key `adminServer` in authentication guide

Signed-off-by: Julian Einhaus <[email protected]>

* docs(flyte-core): fix indentation for flyte-core with AzureAD authorization guide

Signed-off-by: Julian Einhaus <[email protected]>

---------

Signed-off-by: Julian Einhaus <[email protected]>
* update arraynode proto parallelism field to varint compatible int64

Signed-off-by: Paul Dittamo <[email protected]>

* have array nodes utilize workflow parallelism

Signed-off-by: Paul Dittamo <[email protected]>

* return if available parallelism is 0

Signed-off-by: Paul Dittamo <[email protected]>

* unit test

Signed-off-by: Paul Dittamo <[email protected]>

---------

Signed-off-by: Paul Dittamo <[email protected]>
* Don't use `defer` for streak length reporting

Signed-off-by: Thomas Newton <[email protected]>

* Make it work with defer

Signed-off-by: Thomas Newton <[email protected]>

* Fix lint

Signed-off-by: Thomas Newton <[email protected]>

---------

Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
* Testing agents in the development environment

Signed-off-by: Future-Outlier <[email protected]>

* nit

Signed-off-by: Future-Outlier <[email protected]>

* nit

Signed-off-by: Future-Outlier <[email protected]>

* update

Signed-off-by: Future-Outlier <[email protected]>

* rename

Signed-off-by: Future-Outlier <[email protected]>

* blank

Signed-off-by: Future-Outlier <[email protected]>

* rerun build docs ci

Signed-off-by: Future-Outlier <[email protected]>

* update pingsu's advice

Signed-off-by: Future-Outlier <[email protected]>
Co-authored-by: Kevin Su <[email protected]>

* Update pingsu's advice

Signed-off-by: Future-Outlier <[email protected]>
Co-authored-by: Kevin Su <[email protected]>

* deploying agents in the sandbox

Signed-off-by: Future-Outlier <[email protected]>

* rename

Signed-off-by: Future-Outlier <[email protected]>

* nit

Signed-off-by: Future-Outlier <[email protected]>

* Implementing Agent Metadata Service

Signed-off-by: Future-Outlier <[email protected]>

* reorganize and copyedit new content

Signed-off-by: nikki everett <[email protected]>

---------

Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: nikki everett <[email protected]>
Co-authored-by: Kevin Su <[email protected]>
Co-authored-by: nikki everett <[email protected]>
Signed-off-by: Ketan Umare <[email protected]>
Co-authored-by: Ketan Umare <[email protected]>
Signed-off-by: Fabio M. Graetz, Ph.D. <[email protected]>
Signed-off-by: Fabio Grätz <[email protected]>
* add cache client read and write otel tracing

Signed-off-by: Paul Dittamo <[email protected]>

* lint

Signed-off-by: Paul Dittamo <[email protected]>

---------

Signed-off-by: Paul Dittamo <[email protected]>
Signed-off-by: nikki everett <[email protected]>
* Add reference to prom operator install guide

Signed-off-by: davidmirror-ops <[email protected]>

* Adds info about the three base dashboards

Signed-off-by: davidmirror-ops <[email protected]>

* Adds instructions to enable SMs

Signed-off-by: davidmirror-ops <[email protected]>

* Incorporate reviews

Signed-off-by: davidmirror-ops <[email protected]>

* Minor fixes

Signed-off-by: davidmirror-ops <[email protected]>

* Improve format for steps

Signed-off-by: davidmirror-ops <[email protected]>

---------

Signed-off-by: davidmirror-ops <[email protected]>
* chore: remove obsolete flyte config files

* nit

Signed-off-by: Kevin Su <[email protected]>

---------

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
* enable parallelism to be set to nil for array node

Signed-off-by: Paul Dittamo <[email protected]>

* unit test

Signed-off-by: Paul Dittamo <[email protected]>

---------

Signed-off-by: Paul Dittamo <[email protected]>
… template (#5215)

* update with link to dockerfile template

Signed-off-by: nikki everett <[email protected]>

* fix 404 error

Signed-off-by: nikki everett <[email protected]>

---------

Signed-off-by: nikki everett <[email protected]>
* copy changes over from flytesnacks#1553

Signed-off-by: nikki everett <[email protected]>

* fix formatting

Signed-off-by: nikki everett <[email protected]>

* fix 404 error

Signed-off-by: nikki everett <[email protected]>

---------

Signed-off-by: nikki everett <[email protected]>
Signed-off-by: Chi-Sheng Liu <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Co-authored-by: Kevin Su <[email protected]>
Sovietaced and others added 21 commits August 1, 2024 22:19
* Refactor panic handling to middleware

Signed-off-by: Jason Parraga <[email protected]>

* Remove registration of old panicCounter

Signed-off-by: Jason Parraga <[email protected]>

* Add test coverage

Signed-off-by: Jason Parraga <[email protected]>

---------

Signed-off-by: Jason Parraga <[email protected]>
* TEST build

Signed-off-by: Future-Outlier <[email protected]>

* remove emphasize-lines

Signed-off-by: Future-Outlier <[email protected]>

* test build

Signed-off-by: Future-Outlier <[email protected]>

* revert

Signed-off-by: Future-Outlier <[email protected]>

---------

Signed-off-by: Future-Outlier <[email protected]>
* FlytePropeller Compiler Avoid Crash when Type not found

Signed-off-by: Future-Outlier <[email protected]>

* Update pingsu's error message advices

Signed-off-by: Future-Outlier <[email protected]>
Co-authored-by: pingsutw  <[email protected]>

* fix lint

Signed-off-by: Future-Outlier <[email protected]>

* Trigger CI

Signed-off-by: Future-Outlier <[email protected]>

* Trigger CI

Signed-off-by: Future-Outlier <[email protected]>

---------

Signed-off-by: Future-Outlier <[email protected]>
Co-authored-by: pingsutw <[email protected]>
* first version

Signed-off-by: Future-Outlier <[email protected]>

* update

Signed-off-by: Future-Outlier <[email protected]>

---------

Signed-off-by: Future-Outlier <[email protected]>
* add send arg

Signed-off-by: Yee Hing Tong <[email protected]>

* Add acction to remove cache in gh runner

Signed-off-by: Eduardo Apolinario <[email protected]>

* Use correct checked out path

Signed-off-by: Eduardo Apolinario <[email protected]>

* Path in strings

Signed-off-by: Eduardo Apolinario <[email protected]>

* Checkout repo in root

Signed-off-by: Eduardo Apolinario <[email protected]>

* Use the correct path to new action

Signed-off-by: Eduardo Apolinario <[email protected]>

* Do not use gh var in path to clear-action-cache

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove wrong invocation of clear-action-cache

Signed-off-by: Eduardo Apolinario <[email protected]>

* GITHUB_WORKSPACE is implicit in the checkout action

Signed-off-by: Eduardo Apolinario <[email protected]>

* Refer to local `flyte` directory

Signed-off-by: Eduardo Apolinario <[email protected]>

---------

Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
* Make flyteidl releases go through a manual gh workflow

Signed-off-by: Eduardo Apolinario <[email protected]>

* Make flytectl releases go through a manual gh workflow

Signed-off-by: Eduardo Apolinario <[email protected]>

* Rewrite the documentation for `version` and clarify wording in RELEASE.md

Signed-off-by: Eduardo Apolinario <[email protected]>

---------

Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
* fix CHANGELOG-v0.2.0.md

Signed-off-by: Christina <[email protected]>

* fix CHANGELOG-v1.0.2-b1.md

Signed-off-by: Christina <[email protected]>

* fix CHANGELOG-v1.1.0.md

Signed-off-by: Christina <[email protected]>

* fix CHANGELOG-v1.3.0.md

Signed-off-by: Christina <[email protected]>

---------

Signed-off-by: Christina <[email protected]>
* Fetch all tags in flyteidl-release.yml

Signed-off-by: Eduardo Apolinario <[email protected]>

* Fix sed expression for npm job

Signed-off-by: Eduardo Apolinario <[email protected]>

---------

Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
* update

Signed-off-by: Desi Hsu <[email protected]>

* dco

Signed-off-by: Desi Hsu <[email protected]>

* dco

Signed-off-by: Desi Hsu <[email protected]>

* typo

Signed-off-by: Desi Hsu <[email protected]>

---------

Signed-off-by: Desi Hsu <[email protected]>
Signed-off-by: Katrina Rogan <[email protected]>
#5649)

* Don't error when attempting to trigger schedules for inactive projects

Signed-off-by: Katrina Rogan <[email protected]>

* regen

Signed-off-by: Katrina Rogan <[email protected]>

---------

Signed-off-by: Katrina Rogan <[email protected]>
Copy link
Contributor

@wild-endeavor wild-endeavor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the updates!

// If this literal is offloaded, this field will contain metadata including the offload location.
string uri = 6;
// Includes information about the size of the literal.
uint64 size_bytes = 7;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is size important again? I mean there's other metadata as well (etag information). The assumption here is that size is super important so we want to be able to show that without making a head call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general consensus seemed to be this is useful, I guess it's nice for clients who want to decide whether to pull massive datasets?


When writing outputs in the [remote_file_output_writer](https://github.com/flyteorg/flyte/blob/2ca31119d6b9258661a71f38e450f93b6692402c/flyteplugins/go/tasks/pluginmachinery/ioutils/remote_file_output_writer.go#L56-L84) the source code should detect whether the literal size exceeds the configured minimum and
- if the task is using a newer SDK version that supports reading offloaded literals, offload the literal to the configured storage backend and update the literal with the offload URI and size.
- if the task is using an older SDK version that doesn't support offloaded literals, fail the task with an error message indicating that the task output is too large and the user should update their SDK version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a line to say downstream tasks also have to upgraded? that is, if you have a reference/remote task downstream that consumes the map task output, but it hasn't been updated, then it'll fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks


For large outputs (like large maps of large dataclasses), Flytekit should also know how to offload the data. This should be done transparently to the user. How will propeller know to fail though if propeller hasn't been updated?
As a follow-up, we can also implement literal offloading in the SDK for conventional python tasks. Flytekit should also know how to offload the data. This should be done transparently to the user.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we cover that here? or is this only for map tasks for now? for the general case, we were going to go with the solution of propeller setting an environment variable that turns on offloading on the flytekit side?

for erroring if things get too big, i don't know that there's a solution. We should just add a size limit asap in flytekit right @eapolinario? Some env var based setting with a 10MB default. If a literal is more than 10MBs then error. considering we don't know when we'll get to the general case, by the time we do, most users might've already upgraded.

Copy link
Contributor Author

@katrogan katrogan Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're leaning towards not tackling the implementation bits for this proposal but I think it's okay to cover future work here?

Updated to include the bits about failing fast here for too-large literals, thank you!

@katrogan
Copy link
Contributor Author

thanks for the review @wild-endeavor, updated - mind taking another look?

Copy link

codecov bot commented Aug 13, 2024

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 36.17%. Comparing base (b6bba77) to head (e89f434).

Files with missing lines Patch % Lines
datacatalog/cmd/entrypoints/serve.go 0.00% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (b6bba77) and HEAD (e89f434). Click for more details.

HEAD has 13 uploads less than BASE
Flag BASE (b6bba77) HEAD (e89f434)
6 0
unittests 7 0
Additional details and impacted files
@@                    Coverage Diff                     @@
##           rfc/offloaded-literal    #5653       +/-   ##
==========================================================
- Coverage                  58.99%   36.17%   -22.83%     
==========================================================
  Files                        645     1302      +657     
  Lines                      55670   109484    +53814     
==========================================================
+ Hits                       32844    39606     +6762     
- Misses                     20230    65740    +45510     
- Partials                    2596     4138     +1542     
Flag Coverage Δ
unittests ?
unittests-datacatalog 51.37% <0.00%> (?)
unittests-flyteadmin 55.33% <ø> (?)
unittests-flytecopilot 12.17% <ø> (?)
unittests-flytectl 62.28% <ø> (?)
unittests-flyteidl 7.08% <ø> (?)
unittests-flyteplugins 53.31% <ø> (?)
unittests-flytepropeller 41.74% <ø> (?)
unittests-flytestdlib 55.35% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@katrogan katrogan merged commit 59192b7 into rfc/offloaded-literal Aug 13, 2024
47 of 50 checks passed
@katrogan katrogan deleted the rfc/katrina-offloaded-literal branch August 13, 2024 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.