Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Flytekit Schema type extension] Vaex Dataframe plugin #701

Closed
1 of 13 tasks
Tracked by #2917
kumare3 opened this issue Feb 8, 2021 · 1 comment · Fixed by flyteorg/flytekit#1230
Closed
1 of 13 tasks
Tracked by #2917
Assignees
Labels
flytekit FlyteKit Python related issue good first issue Good for newcomers hacktoberfest plugins Plugins related labels (backend or frontend)

Comments

@kumare3
Copy link
Contributor

kumare3 commented Feb 8, 2021

Motivation: Why do you think this is important?
Flytekit should support Vaex as a pandas alternative for FlyteSchema object.
https://github.com/vaexio/vaex

Vaex has great performance on a single machine, which is usually needed for most datasets. Spark & Dask are overkill with lots of complexity for datasets of sizes in few gigabytes. The addition of Vaex and support for automatic serialization and deserialization between consecutive tasks using Arrow/HDF5 would allow great Pandas, Spark, and Vaex interoperability.

Goal: What should the final outcome look like, ideally?
Users should be able to retrieve Vaex Dataframes from a FlyteSchema

def foo(f: FlyteSchema):
    df = f.open(type=vaex.DataFrame)
    ...

Also support for Vaex Dataframe as a type

def foo(f: vaex.DataFrame) -> vaex.DataFrame:
   pass

The plugin should mostly look like the default Pandas DataFrame Transformer and Reader that ships with Flytekit
https://github.com/flyteorg/flytekit/blob/master/flytekit/types/schema/types_pandas.py#L88-L144

Or like the Spark Plugin support for Spark DataFrames like
https://github.com/flyteorg/flytekit/blob/f0b0a7ed854950a3341df710d1f378ef3ed838ab/plugins/flytekit-spark/flytekitplugins/spark/schema.py#L13-L81

Describe alternatives you've considered
NA

Flyte component

  • Overall
  • Flyte Setup and Installation scripts
  • Flyte Documentation
  • Flyte communication (slack/email etc)
  • FlytePropeller
  • FlyteIDL (Flyte specification language)
  • Flytekit (Python SDK)
  • FlyteAdmin (Control Plane service)
  • FlytePlugins
  • DataCatalog
  • FlyteStdlib (common libraries)
  • FlyteConsole (UI)
  • Other

GitHub repo(s)
flytekit

@kumare3 kumare3 added enhancement New feature or request good first issue Good for newcomers untriaged This issues has not yet been looked at by the Maintainers labels Feb 8, 2021
@kumare3 kumare3 added plugins Plugins related labels (backend or frontend) flytekit FlyteKit Python related issue and removed enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers labels Feb 9, 2021
@kumare3 kumare3 changed the title [Feature][Flytekit Plugin] Vaex Dataframe plugin [Feature][Flytekit Schema type extension] Vaex Dataframe plugin Feb 26, 2021
@yindia yindia self-assigned this Apr 9, 2021
@yindia yindia removed their assignment May 30, 2021
palchicz pushed a commit to palchicz/flyte that referenced this issue Dec 23, 2021
@ryankarlos
Copy link

ryankarlos commented Oct 10, 2022

@samhita-alla Ive added PR flyteorg/flytekit#1230 for this issue. Could this be assigned to me please.
Also could you please add Hacktoberfest label to my PR as well, thanks !

eapolinario pushed a commit to eapolinario/flyte that referenced this issue Dec 20, 2022
* Update run_conditions.py

Grammar fix
Signed-off-by: SmritiSatyanV <[email protected]>

* Updates based on comments

Signed-off-by: SmritiSatyanV <[email protected]>

* Fixed error

Signed-off-by: SmritiSatyanV <[email protected]>

* Updated run_conditions.py

Updates based on comments
Signed-off-by: SmritiSatyanV <[email protected]>

* minor changes

Signed-off-by: Samhita Alla <[email protected]>

Co-authored-by: Samhita Alla <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flytekit FlyteKit Python related issue good first issue Good for newcomers hacktoberfest plugins Plugins related labels (backend or frontend)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants