-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to query by partition column #1445
Comments
Thanks for the report! I put up a draft PR demonstrating the issue as a test case. |
I also noticed that with no partition, in order to get date filters to work you have to do |
I played around with this. I think the issue is that Datafusion / arrow cannot cast a Date32 into a When I prevent Date32 from being dictionary encoded here my tests don't error out. #[tokio::test]
async fn test_date() -> Result<(), Box<dyn Error>>{
let ctx = SessionContext::new();
let schema: Schema = serde_json::from_value(json!({
"type": "struct",
"fields": [
{"name": "id", "type": "string", "nullable": true, "metadata": {}},
{"name": "my_date", "type": "date", "nullable": true, "metadata": {}},
]
}))
.unwrap();
let table = DeltaOps::new_in_memory()
.create()
.with_save_mode(SaveMode::ErrorIfExists)
.with_columns(schema.get_fields().clone())
.with_partition_columns(["my_date"])
.await
.unwrap();
assert_eq!(table.version(), 0);
let data = ctx.sql("select 1 as id, now() as my_date").await?.collect().await?;
let table = DeltaOps(table)
.write(data)
.await
.unwrap();
ctx.register_table("test", Arc::new(table)).unwrap();
println!("test1");
let data = ctx.sql("select * from test where my_date <= arrow_cast(now(), 'Date32')").await?;
data.show().await?;
println!("test2");
let data = ctx.sql("select * from test where my_date > '2023-06-07'").await?;
data.show().await?;
println!("test3");
ctx.table("test").await?
.filter(col("my_date").gt(lit("2023-06-05")))?
.show().await?;
Ok(())
} |
Looking at a few of my tables, using the type SchemaField::new(
"date".to_string(),
SchemaDataType::primitive("string".to_string()),
false,
HashMap::new(),
); |
That's very odd. Arrow should be able to cast any type into it's dictionary form. If it's the only quick fix we can find, we can remove the dictionary encoding on partition columns. But we should add it back when we can, since it's much more performant for those to be dictionary. |
This is the query I used and the resulting error let data = ctx.sql("select * from test where my_date <= arrow_cast(now(), 'Dictionary(UInt16, Date32)')").await?; Error Error: Context("Optimizer rule 'simplify_expressions' failed", ArrowError(CastError("Unsupported output type for dictionary packing: Date32"))) which traces to here |
Reported this upstream: apache/arrow-rs#4390 Sounds like we should remove the dictionary encoding for now then. And we can try bringing it back once we fix that upstream issue. |
FWIW |
Ah okay. If Arrow won't support that well, then let's not plan on using dictionary columns for small values. |
Thanks for the quick responses guys! FWIW this fixed it (ae7d2d2) Not sure if the right fix for the repo at large, but it at least unblocked me. |
I can confirm that @ChewingGlass's fix works for our projects as well. I saw a PR was started but is hanging, happy to push a combined PR (ae7d2d2 and #1447). I tried building locally without success (deltalake-python / pyo3 don't seem to like my M2 Max arch), but working through that. |
Just made a PR #1481 |
Environment
Max
Delta-rs version: latest(
deltalake = { git = "https://github.com/delta-io/delta-rs", branch = "main", features = ["s3", "datafusion"] }
)Binding:
Environment: OsX
Bug
What happened:
I create a table with a "date" partition column,
If I run any query that either selects
date
or filters bydate
, I get:Removing the partition from the table and querying no longer results in an error, but querying on the date field is not working as expected (any query on the field results in no data).
What you expected to happen:
I expect the queries to run and filter by date
How to reproduce it:
See above
The text was updated successfully, but these errors were encountered: