[BUG] Structured Dataset compatibility between plugins #3189

esadler-hbo · 2022-12-24T15:53:56Z

Describe the bug

I was running some tasks in a notebooks where I was passing the results of a Spark task as a StructuredDataset and then trying to load them into a polars dataframe and a hugging face dataset.

It resulted in the following error for both plugins No such file or directory: /var/folders/wq/3hjh3ms916b6dj56zx0f_x000000gq/T/flyte-69d2tww2/sandbox/local_flytekit/95bac8efeb64a8d10d34c73b66df7051/00000. However, it did work for pandas.

It seems like polars and huggingface add in 00000 to the path in the transformers and spark does not.

Expected behavior

I would expect to be able to use a StructuredDataset from spark with dataframe libraries from all plugins.

Additional context to reproduce

from flytekit import task, StructuredDataset
from flytekitplugins.spark.task import Spark
from datasets import Dataset
import polars as pl
import datasets
import pandas as pd

@task(
task_config=Spark()
)
def spark_task(path: str) -> StructuredDataset:
sess = flytekit.current_context().spark_session
df = sess.read.parquet(path)
return StructuredDataset(dataframe=df)

df = spark_task(path="./ratings_100k.parquet")

try:
df.open(pl.DataFrame).all().head()
except Exception as e:
print(e)

try:
df.open(datasets.Dataset).all().head()
except Exception as e:
print(e)

df.open(pd.DataFrame).all().head()

Screenshots

Are you sure this issue hasn't been raised already?

Yes

Have you read the Code of Conduct?

Yes

The text was updated successfully, but these errors were encountered:

welcome · 2022-12-24T15:53:58Z

Thank you for opening your first issue here! 🛠

nightscape · 2023-05-09T07:12:40Z

@esadler-hbo this seems to be resolved by flyteorg/flytekit#1406.
Can you verify?

pingsutw · 2023-12-22T20:28:31Z

yes, we've fixed it

esadler-hbo added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Dec 24, 2022

pingsutw mentioned this issue Jan 6, 2023

Read structured dataset from a folder flyteorg/flytekit#1406

Merged

8 tasks

pingsutw added flytekit FlyteKit Python related issue and removed untriaged This issues has not yet been looked at by the Maintainers labels Dec 22, 2023

pingsutw self-assigned this Dec 22, 2023

pingsutw closed this as completed Dec 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Structured Dataset compatibility between plugins #3189

[BUG] Structured Dataset compatibility between plugins #3189

esadler-hbo commented Dec 24, 2022 •

edited

Loading

welcome bot commented Dec 24, 2022

nightscape commented May 9, 2023

pingsutw commented Dec 22, 2023

[BUG] Structured Dataset compatibility between plugins #3189

[BUG] Structured Dataset compatibility between plugins #3189

Comments

esadler-hbo commented Dec 24, 2022 • edited Loading

Describe the bug

Expected behavior

Additional context to reproduce

Screenshots

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

welcome bot commented Dec 24, 2022

nightscape commented May 9, 2023

pingsutw commented Dec 22, 2023

esadler-hbo commented Dec 24, 2022 •

edited

Loading