-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: bump to datafusion 39, arrow 52, pyo3 0.21 #2581
Conversation
Co-authored-by: R. Tyler Croy <[email protected]>
This is now ready for review. All python and rust tests are passing locally on my machine (docs seem to have issues building, though). This only enables the gil-refs feature in the python bindings, it does not migrate to the new API (will be done in a follow-up PR) |
@@ -276,33 +272,6 @@ impl<'a> Display for SqlFormat<'a> { | |||
write!(f, "{expr} IN ({})", expr_vec_fmt!(list)) | |||
} | |||
} | |||
Expr::GetIndexedField(GetIndexedField { expr, field }) => match field { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this being removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those functions don't exist anymore, and based on my understanding from apache/datafusion#10568, those rewrites happen earlier now so the functionality stays the same, it's just done by DF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering if we can catch the new functions and keep the old display formatting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, I'm not entirely sure this is possible. Those operations have been converted into ScalarFunctions, so the concept of "taking an index of a field" is now "a scalar function that operates on a field and an index". Had to modify a bunch of tests to accommodate.
delta-rs/crates/core/src/delta_datafusion/expr.rs
Lines 656 to 659 in af5997c
simple!( | |
col("_struct").field("a").eq(lit(20_i64)), | |
"get_field(_struct, 'a') = 20".to_string() | |
), |
It could be possible manually match on scalar functions and convert them into the old style, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it would make the roundtrippable format closer to SQL, but I guess with the timestamp parsing I already kind of stopped doing that 😆
Nice work! @abhiaagarwal |
Thank you @ion-elgreco! DF 39 should be released later this week, and the upstream delta-kernel needs to be resolved as well. I'll open a follow-up PR for moving to the new pyo3 lifetimes, and I'm also interested in implementing an async-native API using |
@abhiaagarwal yeah leaving it unmerged for now. Also checking with rest on slack if they are fine with keeping the git reference of delta-kernel-rs for time being. Great, you can ping me when you got the new bound APIs working! Just two things regarding pyo3-asyncio. It seems the original maintainer is not very active anymore, so it's still locked to pyo3 0.20 unfortunately. The other thing is, where do you need it for? :) as most of the async stuff is internal |
@ion-elgreco yeah, but it appears that the maintainer of pyo3 is interested in taking stewardship over it, so maybe that'll be resolved soon. I'm mostly interested to see if there are any perf improvements. Most of the code I use with the python deltalake package in already is async native (ie. async main), having async methods won't "infect" the rest of our codebase. We use it as a pseudo backend with FastAPI, for reference. There appears to be work being done in the main pyo3 repo that'll allow the gil to be released in async methods, so we could have the best of both worlds. |
@abhiaagarwal can you bump delta-kernel-rs to 0.1.1, it includes your changes now. And also bump the python cargo toml to 0.18.1? |
@ion-elgreco sure, I'll do it by EOD. Also will bump DF since they released 39 |
Cargo.toml
Outdated
@@ -26,33 +26,33 @@ debug = true | |||
debug = "line-tables-only" | |||
|
|||
[workspace.dependencies] | |||
delta_kernel = { version = "0.1" } | |||
delta_kernel = { git = "https://github.com/abhiaagarwal/delta-kernel-rs", branch = "arrow_52" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should now be updated to 0.1.1
Bumped to published versions, also added a new test that seemingly confirms that the issues related to decimals in #1778 et al on both engines seem to be fixed |
Yes I was waiting for some time for the arrow bump since they added Scientific notation support for decimals :) Thanks for adding the test! 🙂 |
Description
Updates the arrow and datafusion dependencies to 52 and 39(-rc1) respectively. This is necessary for updating pyo3.
While most changes with trivial, some required big rewrites. Namely, the logic for the Updates operation had to be rewritten (and simplified) to accommodate some new sanity checks inside datafusion: (apache/datafusion#10088).
Depends on delta-kernel having its arrow and object-store version bumped as well. This PR doesn't include any major changes for pyo3, I'll open a separate PR depending on this PR.
Related Issue(s)
Documentation