Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support decimal for min and max aggregate #1407

Merged
merged 4 commits into from
Dec 10, 2021

Conversation

liukun4515
Copy link
Contributor

@liukun4515 liukun4515 commented Dec 7, 2021

Which issue does this PR close?

part of #122

TODO

In the feature
support SIMD in the arrow-rs apache/arrow-rs#1010

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Dec 7, 2021
@liukun4515 liukun4515 changed the title support decimal for min/max agg [WIP]support decimal for min/max agg Dec 7, 2021
@liukun4515 liukun4515 changed the title [WIP]support decimal for min/max agg support decimal for min/max agg Dec 7, 2021
@liukun4515
Copy link
Contributor Author

@alamb @houqp PTAL
This is first pr for aggregate function supported decimal

@houqp houqp added the enhancement New feature or request label Dec 9, 2021
@houqp houqp requested a review from alamb December 9, 2021 05:56
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @liukun4515 -- I think this PR is great as it adds tests for the new feature; I had some code / style suggestions, but I also think it would be fine to address them in another PR (or in the future)

Please just let us know what you want to do

}

#[tokio::test]
async fn aggregate_decimal_max() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -129,11 +131,49 @@ macro_rules! typed_min_max_batch {
}};
}

// TODO implement this in arrow-rs with simd
// https://github.com/apache/arrow-rs/issues/1010
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not reviewed the code in the datafusion aggregate functions for a while, so I am not familiar with how much they do / don't use the arrow compute kernels, but I think the more we can leverage / reuse those kernels (and their SIMD specializations, if they exist), the better

}
ScalarValue::Decimal128(Some(result), *$PRECISION, *$SCALE)
} else {
let mut result = 0_i128;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more idomatic to use let mut rust: Option<i128> = 0; (aka use an Option rather than explicit flag)

Then instead of code like

                    if !has_value && array.is_valid(i) {
                        has_value = true;
                        result = array.value(i);
                    }
                    if array.is_valid(i) {
                        result = result.$OP(array.value(i));
                    }

You could write code like the following (which I think saves at least one check of is_valid()):

                    if array.is_valid(i) {
                        let value = array.value(i);
                        result = result.$OP(result.unwrap_or(value)
                    }

This is just a style suggestion, it is not needed I don't think

Copy link
Contributor Author

@liukun4515 liukun4515 Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to resolve it in the follow-up pull request.
You can merge it if it looks good to you.
@alamb

Copy link
Contributor Author

@liukun4515 liukun4515 Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recheck your suggestion and find some bugs.
For example, if all value in the array is less than zero, it may be [-1,-3,-3] and we want to get the max value of them.
initially, we set the result as Some(0) and follow your suggestion code, we will get the 0 as the max value for the result.

I think we should just change to that

let mut result = 0_i128; ->>> let mut result : i128 = 0;

In addition the unwrap_or is

    pub fn unwrap_or(self, default: T) -> T {
        match self {
            Some(x) => x,
            None => default,
        }
    }

// Statically-typed version of min/max(array) -> ScalarValue for non-string types.
// this is a macro to support both operations (min and max).
macro_rules! min_max_batch {
($VALUES:expr, $OP:ident) => {{
match $VALUES.data_type() {
DataType::Decimal(precision, scale) => {
typed_min_max_batch_decimal128!($VALUES, precision, scale, $OP)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you added DecimalArray support to arrow::compute::kernels::min and arrow::compute::kernels::max you might be able to use typed_min_max_batch! here. Work for the future perhaps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when I finish the task of aggregating with decimal data type, I will begin to do basic operations in arrow-rs for decimal data type.

typed_min_max_decimal!(lhsv, rhsv, lhsp, lhss, Decimal128, $OP)
} else {
return Err(DataFusionError::Internal(format!(
"MIN/MAX is not expected to receive scalars of incompatible types {:?}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@liukun4515
Copy link
Contributor Author

@alamb it's can be merged.
Thanks.

@alamb alamb merged commit e89da30 into apache:master Dec 10, 2021
@alamb
Copy link
Contributor

alamb commented Dec 10, 2021

Thanks again @liukun4515 ❤️

@alamb alamb changed the title support decimal for min/max agg Support decimal for min and max aggregate Feb 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants