-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function #8688
ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function #8688
Conversation
@andygrove @nevi-me would love to hear your feedback on this.... this addition has been useful to us internally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went carefully through this, and it is really good. Thanks a lot @velvia.
I left some comments that IMO make the PR even more generic.
Would it be possible to update the section SQL Support
on the README
with this?
pub static SUPPORTED_NULLIF_TYPES: &'static [DataType] = &[ | ||
DataType::Boolean, | ||
DataType::UInt8, | ||
DataType::UInt16, | ||
DataType::UInt32, | ||
DataType::UInt64, | ||
DataType::Int8, | ||
DataType::Int16, | ||
DataType::Int32, | ||
DataType::Int64, | ||
DataType::Float32, | ||
DataType::Float64, | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to make these trait bounds with a good comment about how these are selected. I didn't understand:
/// The order of these types correspond to the order on which coercion applies
Can you explain it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to make these trait bounds with a good comment about how these are selected
AFAIK these cannot be trait bounds because logical and physical planning is dynamically typed.
In this case, this is enumerating all valid types that can be (dynamically) passed to the function. If someone tries to call this function with e.g. a ListArray
, the logical planner will error with a description that this function does not support that type.
The order here matters because when a function is planned to be called with type X
that is not supported by the function, the physical planner will try to (lossless) cast that type to a valid type for that functions, and it does so in the order of this array. In general these should be ordered from fastest to slowest (in the eyes of the implementation), so that the cast chooses the type with the fastest implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, in our private project at work, I have used type algebra definitions to not do these. For now, this can go like how it is, but later I can open a type algebra pr to convert all these to castability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is interesting. I would be interested in knowing what is the issue with the current implementation and why type algebra definitions should be used instead. Could you first introduce a proposal with the design e.g. on a google docs, before the PR? In DataFusion we have been doing that for larger changes to avoid committing to an implementation before some general agreement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh definitely will do, give me some time to wrap my head up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience, @velvia.
I went through this more carefully, and I think that we have two issues:
- the null count of the returned array may not match the number of bits
- there may be some issue with arrays with
offset != 0
.
I left some comments in the code trying to describe what they are and how I tested them.
d6b39ad
to
2f40764
Compare
Co-authored-by: Jorge Leitao <[email protected]>
4cf478c
to
174db8e
Compare
@jorgecarleitao and others:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the effort. LGTM
It is fine to remain for primitive types for now.
How does it get merged from here? Sorry unclear on the process, thanks! |
@velvia , just needs time from a committer: typically over the weekend I run through all PRs in order of appearenace and merge them if ready. Thanks a for the patience. |
@jorgecarleitao I've just gone through this, and don't have anything else to add. Can I merge it? |
@nevi-me , I went through it already. Thanks a lot! |
Thanks everyone! |
This fixes the CI. Was introduced in #8688 , I guess because the CI there ran before the merge. Closes #8791 from Dandandan/fix_clone Authored-by: Heres, Daniel <[email protected]> Signed-off-by: Neville Dipale <[email protected]>
This PR implements the NULLIF() SQL function in DataFusion. It is implemented as a BuiltInScalarFunction, with a boolean kernel at the core which creates a new array with a modified null bitmap from the original array, based on the result of a boolean expression. When an input data item is equal to the right side in NULLIF(), then the item's nullity becomes set in the output array. Closes apache#8688 from velvia/evan/rust-datafusion-nullif-func Lead-authored-by: Evan Chan <[email protected]> Co-authored-by: Evan Chan <[email protected]> Signed-off-by: Neville Dipale <[email protected]>
This fixes the CI. Was introduced in apache#8688 , I guess because the CI there ran before the merge. Closes apache#8791 from Dandandan/fix_clone Authored-by: Heres, Daniel <[email protected]> Signed-off-by: Neville Dipale <[email protected]>
This PR implements the NULLIF() SQL function in DataFusion. It is implemented as a BuiltInScalarFunction, with a boolean kernel at the core which creates a new array with a modified null bitmap from the original array, based on the result of a boolean expression. When an input data item is equal to the right side in NULLIF(), then the item's nullity becomes set in the output array.