-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
delta_rs don't seems to respect the row group size #2309
Comments
Two things, I see the max_rows_per_group is incorrectly passed to the write_batch_size from python to rust, I'll make a fix to remove that. You should use the WriterProperties class and pass that to write_deltalake, that contains |
@ion-elgreco thanks, But I am more interested in min_row_group_size, I want the minimum to be 8 Millions rows ? |
@djouallah I don't see a way to set a minimum in the parquet crate |
I am using this and it does not seems to be working ?
|
@djouallah works for me: from deltalake import WriterProperties
import polars as pl
import pyarrow.parquet as pq
import os
df = pl.DataFrame({
"foo": list(range(10_000_000))
})
wp = WriterProperties(max_row_group_size=8_000_000)
df.write_delta("test_table",mode="append", delta_write_options={"writer_properties":wp, "engine":"rust"}) file = list(filter(lambda x: '.parquet' in x, os.listdir("test_table")))[0]
metadata = pq.read_metadata(os.path.join("test_table", file))
for i in range(metadata.num_row_groups):
print(metadata.row_group(i)) result:
|
Thanks, it will be nice if it is documented :) |
@djouallah it it mentioned in the parameter: "Optional[WriterProperties] writer properties to the Rust parquet writer." But we can also add this in some usage docs, do you want to perhaps open a PR for that? :) |
# Description Was passing the wrong param - closes #2309
I would like to have delta_rs to write parquet files with bigger row group in Parquet, ideally 8 Millions rows per group, but some far, it does not seems to works, What I am missing ?
The text was updated successfully, but these errors were encountered: