You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was running some tasks in a notebooks where I was passing the results of a Spark task as a StructuredDataset and then trying to load them into a polars dataframe and a hugging face dataset.
It resulted in the following error for both plugins No such file or directory: /var/folders/wq/3hjh3ms916b6dj56zx0f_x000000gq/T/flyte-69d2tww2/sandbox/local_flytekit/95bac8efeb64a8d10d34c73b66df7051/00000. However, it did work for pandas.
It seems like polars and huggingface add in 00000 to the path in the transformers and spark does not.
I would expect to be able to use a StructuredDataset from spark with dataframe libraries from all plugins.
Additional context to reproduce
from flytekit import task, StructuredDataset
from flytekitplugins.spark.task import Spark
from datasets import Dataset
import polars as pl
import datasets
import pandas as pd
Describe the bug
I was running some tasks in a notebooks where I was passing the results of a Spark task as a
StructuredDataset
and then trying to load them into a polars dataframe and a hugging face dataset.It resulted in the following error for both plugins
No such file or directory: /var/folders/wq/3hjh3ms916b6dj56zx0f_x000000gq/T/flyte-69d2tww2/sandbox/local_flytekit/95bac8efeb64a8d10d34c73b66df7051/00000
. However, it did work for pandas.It seems like polars and huggingface add in
00000
to the path in the transformers and spark does not.Expected behavior
I would expect to be able to use a
StructuredDataset
from spark with dataframe libraries from all plugins.Additional context to reproduce
from flytekit import task, StructuredDataset
from flytekitplugins.spark.task import Spark
from datasets import Dataset
import polars as pl
import datasets
import pandas as pd
@task(
task_config=Spark()
)
def spark_task(path: str) -> StructuredDataset:
sess = flytekit.current_context().spark_session
df = sess.read.parquet(path)
return StructuredDataset(dataframe=df)
df = spark_task(path="./ratings_100k.parquet")
try:
df.open(pl.DataFrame).all().head()
except Exception as e:
print(e)
try:
df.open(datasets.Dataset).all().head()
except Exception as e:
print(e)
df.open(pd.DataFrame).all().head()
Screenshots
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: