-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support decimal for min
and max
aggregate
#1407
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @liukun4515 -- I think this PR is great as it adds tests for the new feature; I had some code / style suggestions, but I also think it would be fine to address them in another PR (or in the future)
Please just let us know what you want to do
} | ||
|
||
#[tokio::test] | ||
async fn aggregate_decimal_max() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -129,11 +131,49 @@ macro_rules! typed_min_max_batch { | |||
}}; | |||
} | |||
|
|||
// TODO implement this in arrow-rs with simd | |||
// https://github.com/apache/arrow-rs/issues/1010 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not reviewed the code in the datafusion aggregate functions for a while, so I am not familiar with how much they do / don't use the arrow compute kernels, but I think the more we can leverage / reuse those kernels (and their SIMD specializations, if they exist), the better
} | ||
ScalarValue::Decimal128(Some(result), *$PRECISION, *$SCALE) | ||
} else { | ||
let mut result = 0_i128; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be more idomatic to use let mut rust: Option<i128> = 0;
(aka use an Option
rather than explicit flag)
Then instead of code like
if !has_value && array.is_valid(i) {
has_value = true;
result = array.value(i);
}
if array.is_valid(i) {
result = result.$OP(array.value(i));
}
You could write code like the following (which I think saves at least one check of is_valid()
):
if array.is_valid(i) {
let value = array.value(i);
result = result.$OP(result.unwrap_or(value)
}
This is just a style suggestion, it is not needed I don't think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to resolve it in the follow-up pull request.
You can merge it if it looks good to you.
@alamb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recheck your suggestion and find some bugs.
For example, if all value in the array is less than zero, it may be [-1,-3,-3] and we want to get the max value of them.
initially, we set the result as Some(0)
and follow your suggestion code, we will get the 0
as the max value for the result.
I think we should just change to that
let mut result = 0_i128; ->>> let mut result : i128 = 0;
In addition the unwrap_or
is
pub fn unwrap_or(self, default: T) -> T {
match self {
Some(x) => x,
None => default,
}
}
// Statically-typed version of min/max(array) -> ScalarValue for non-string types. | ||
// this is a macro to support both operations (min and max). | ||
macro_rules! min_max_batch { | ||
($VALUES:expr, $OP:ident) => {{ | ||
match $VALUES.data_type() { | ||
DataType::Decimal(precision, scale) => { | ||
typed_min_max_batch_decimal128!($VALUES, precision, scale, $OP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you added DecimalArray
support to arrow::compute::kernels::min
and arrow::compute::kernels::max
you might be able to use typed_min_max_batch!
here. Work for the future perhaps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, when I finish the task of aggregating with decimal data type, I will begin to do basic operations in arrow-rs for decimal data type.
typed_min_max_decimal!(lhsv, rhsv, lhsp, lhss, Decimal128, $OP) | ||
} else { | ||
return Err(DataFusionError::Internal(format!( | ||
"MIN/MAX is not expected to receive scalars of incompatible types {:?}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@alamb it's can be merged. |
Thanks again @liukun4515 ❤️ |
min
and max
aggregate
Which issue does this PR close?
part of #122
TODO
In the feature
support SIMD in the arrow-rs apache/arrow-rs#1010
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?