-
Notifications
You must be signed in to change notification settings - Fork 672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core feature] Convert List[Any] to a single pickle file #3207
Comments
Does this scale well? What if the list contains millions of items? Or what if each list item has a large size? Would it not be a better approach to take batches of list items and upload each batch as separate pickle file? There could be a setting for the desired upper file size of each pickle file. |
Yup, good idea. This is one of the options. We can parse annotated to get the number of items saved in a pickle file. @task
def t1() -> Annotated[List[Any], 100]
... The default behavior could write all the data to one pickle file. |
Hello 👋, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏 |
Hello 👋, This issue has been inactive for over 9 months and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! 🙏 |
Motivation: Why do you think this is important?
Currently, flyte create N (size of list) pickle files if output type is
List[Any]
. This slows down serialization. it takes more than 15 mins to upload the pickles to s3 if the size of list is 1000.People don't care about how we serialize
List[Any]
. We can just convert entire list into a single pickle file, which reduces the time required for serialization.Goal: What should the final outcome look like, ideally?
it will make serialization faster
Describe alternatives you've considered
Propose: Link/Inline OR Additional context
Slack Thread
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: