Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core feature] LiteralBlob and StructuredDataset metadata #5461

Open
2 tasks done
kumare3 opened this issue Jun 8, 2024 · 1 comment
Open
2 tasks done

[Core feature] LiteralBlob and StructuredDataset metadata #5461

kumare3 opened this issue Jun 8, 2024 · 1 comment
Labels
backlogged For internal use. Reserved for contributor team workflow. enhancement New feature or request

Comments

@kumare3
Copy link
Contributor

kumare3 commented Jun 8, 2024

Motivation: Why do you think this is important?

At runtime, Flyte knows about various metadata elements for a file / underlying dataset.
It would be amazing to catalog them and show them in the UI.
This metadata should be associated with the runtime objects and should be optional

Goal: What should the final outcome look like, ideally?

Example:

  1. For FlyteFile - Blob.Single
  • Size of the file,
  • zipped boolean
  • (we already store the format in the type)
  1. For Directory / Blob.Multi
  • Size of the total dataset
  • zipped boolean
  • an Index for the directory with all the files stored
  • (we already store the format in the type)
  1. StructureDataset
  • Number of paritions
  • Zipped
  • Format
  • Index of all the files / partitions

Note
We could also use this metadata to document lifecycle - for example if an expiration is set.
This can be useful in determining cache hits based on data TTLs. For example, each File / dataset etc can have the max-age set and this can be valid until date.

Describe alternatives you've considered

Store is separately, but then every File will become a directory and the format is only python. UI will have to implement this again.

Propose: Link/Inline OR Additional context

Add Blob metadata to

And
Structured Dataset metadata to

StructuredDatasetType structured_dataset_type = 1;

Also ideally the FileAccessProvider interface returns this metadata after each upload?
https://github.com/flyteorg/flytekit/blob/070027d8781d9b9f830723a9a4cae5daa6359a33/flytekit/core/data_persistence.py#L311

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@kumare3 kumare3 added enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers and removed untriaged This issues has not yet been looked at by the Maintainers labels Jun 8, 2024
Copy link

runllm bot commented Jun 8, 2024

To address the issue of associating metadata with LiteralBlob and StructuredDataset at runtime, Flyte can leverage its existing metadata handling capabilities. According to the Flyte documentation, Flyte already manages metadata for task inputs and outputs, storing references to large objects in backend storage.

For FlyteFile and Blob.Single, metadata such as file size and whether the file is zipped can be added. Similarly, for Directory or Blob.Multi, metadata can include the total dataset size, zipped status, and an index of all files. For StructuredDataset, metadata can include the number of partitions, zipped status, format, and an index of all files/partitions.

Would you like more detail on how to implement this in Flyte?

Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @RunLLM to ask follow-up questions. See runllm.com for more!

@eapolinario eapolinario added the backlogged For internal use. Reserved for contributor team workflow. label Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlogged For internal use. Reserved for contributor team workflow. enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants