Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while writing Pandas DataFrame to Delta Lake (S3) #2051

Closed
vinamrgrover opened this issue Jan 7, 2024 · 5 comments
Closed

Error while writing Pandas DataFrame to Delta Lake (S3) #2051

vinamrgrover opened this issue Jan 7, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@vinamrgrover
Copy link

write_deltalake isn't working as expected

I encountered an error while calling write_deltalake method:

ValueError : you must provide schema if data is iterable

Even though it worked perfectly with a PyArrow Table, it didn't work for Pandas DataFrame.

How to reproduce it:

# To reproduce it, just try to write a Pandas DataFrame to an S3 Bucket

write_deltalake(
    <your_s3_url>,
    data = df,
    storage_options = storage_options,
    overwrite_schema = True,
    mode = 'overwrite'
)
@vinamrgrover vinamrgrover added the bug Something isn't working label Jan 7, 2024
@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Jan 7, 2024

@vinamrgrover please provide the deltalake version you're using, also share a minimal reproducible example

@vinamrgrover
Copy link
Author

Version : 0.15.0

I already shared a reproducible example above

@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Jan 7, 2024

@vinamrgrover a reproducible example should also include a sample dataframe..

@ion-elgreco
Copy link
Collaborator

You are likely passing something that is not a pandas dataframe, I can write a pandas dataframe with write_deltalake.

import pandas as pd
from deltalake import write_deltalake
df = pd.DataFrame({'foo':['test']})
write_deltalake('test_table_PATH', data=df, mode='overwrite', overwrite_schema=True)

@vinamrgrover
Copy link
Author

Please don't mark it as completed, I better know I have passed Pandas DataFrame in write_deltalake method. I think the issue is with the _has_pandas variable within writer.py module. Try executing this in a new virtual environment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants