Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] [flytekit] pyflyte register - everything #2640

Closed
2 tasks done
wild-endeavor opened this issue Jun 29, 2022 · 2 comments
Closed
2 tasks done

[Docs] [flytekit] pyflyte register - everything #2640

wild-endeavor opened this issue Jun 29, 2022 · 2 comments
Assignees
Labels
documentation Improvements or additions to documentation untriaged This issues has not yet been looked at by the Maintainers

Comments

@wild-endeavor
Copy link
Contributor

wild-endeavor commented Jun 29, 2022

Description

pyflyte register has been merged but there are no example or documentation. Need to figure out how to fit this into the story, and explain what it is, why it's different, when to use it and such.

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@wild-endeavor wild-endeavor added documentation Improvements or additions to documentation untriaged This issues has not yet been looked at by the Maintainers labels Jun 29, 2022
@wild-endeavor
Copy link
Contributor Author

pyflyte package and flytectl register together form the registration process. The former is responsible for parsing and compiling users' Python code into Flyte protobuf objects and the latter is responsible for shipping those objects over the network to the Flyte control plane. In the process of shipping them over, flytectl is also able to set certain attributes like the K8s service account or IAM role to use.

pyflyte run is a convenience command that combines these two steps, as well as launching of a given workflow, all into one command. See xyz.

It has some limitations however, namely it only operates on single files, where all relevant Flyte entities are defined in that file. This is limiting but run was designed to be a quick and dirty iteration tool, particularly useful when first getting started or when testing something small, not a fully featured production scale mode of operation.

pyflyte register is meant to bridge the gap between the two modes. It offers the functionality of the package command, but most of the convenience of pyflyte run. Use it like

pyflyte --config ~/.flyte/dev-uniondemo.yaml register --image ghcr.io/flyteorg/flytecookbook:core-latest --image trainer=ghcr.io/flyteorg/flytecookbook:core-latest --image predictor=ghcr.io/flyteorg/flytecookbook:core-latest --raw-data-prefix s3://development-service-flyte/reltsts flyte_basics

Both are fast register only (non-fast coming for register in the future).

Note that neither register nor run will work on Python namespace packages since both tools traverse the filesystem to find the first folder that doesn't have an __init__.py file, which it interprets as the root of the project. Both register and run use this root as the basis for naming the Flyte entities.

@wild-endeavor
Copy link
Contributor Author

Answers to some more questions:

  • Limitations of pyflyte run? Is it just that it can only operate on single files?
    For the most part, yes. It's actually more restrictive than even that but I don't know how much we want to get into in the weeds. If you run a script that has ten tasks, but you run a workflow that only calls two of them, the other eight don't actually get registered right now. The other difference is that while we should be making a "non-fast" version of register, I think run will always be fast, meaning we'll always be zipping up the script and sending it to admin. (GH issue for non-fast register coming once I have time to write it).

  • At a macroscopic level, is pyflyte register = pyflyte run minus “launching the execution”? Yes that's one difference, but the main difference is that register is meant to work on an entire repo, much more mimicking package rather than one single file.

  • Also, what’s the preferred flow now?
    Not sure actually how to best say this... but both. Users should use both run and register/package. run is useful for first starting out and it remains useful for when you're testing small bits... I personally use it all the time. But I might be testing small user code more often than others. If you just have something small you want to try out, and you're iterating pretty actively on it, run is more convenient.

  • Should we not promote the use of pyflyte package and flytectl register anymore?
    The more interesting difference is actually this one. The difference between register and package. register is basically package with smarter naming semantics and combine the network call into one step. If you want to make a "portable" zip file of all your tasks and workflows, and then give them to someone else to register, then package makes more sense. You can't use register if you don't know what run-time options you want yet (iam role, service account, etc.). But if you're working in your own/one environment, then register is perfectly fine to use.

  • Where can the detailed pyflyte register and pyflyte run documentation be found? Is it the API & CLI reference? Umm, no you're writing it :) Let me know if the code comments are bad, I can help shore those up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation untriaged This issues has not yet been looked at by the Maintainers
Projects
None yet
Development

No branches or pull requests

4 participants