Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use f64::total_cmp instead of OrderedFloat #4133

Merged
merged 9 commits into from
Nov 10, 2022

Conversation

comphead
Copy link
Contributor

@comphead comphead commented Nov 7, 2022

Which issue does this PR close?

Closes #4051 .

Rationale for this change

See #4051

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the physical-expr Physical Expressions label Nov 7, 2022
@comphead
Copy link
Contributor Author

comphead commented Nov 7, 2022

@tustvold I have replaced almost all entries OrderedFloat to f64. Still thinking how to use you hasher to remove OrderedFloat from Hash.
As your trait implement HashValue and ScalarValue requires std::cmp::Hash

@tustvold
Copy link
Contributor

tustvold commented Nov 8, 2022

@comphead I would recommend creating a newtype wrapper around a float that implements Hash using hash_utils, eq using total_cmp, etc...

@comphead
Copy link
Contributor Author

comphead commented Nov 9, 2022

@comphead I would recommend creating a newtype wrapper around a float that implements Hash using hash_utils, eq using total_cmp, etc...

Hi @tustvold I have implemented hasher through std::hash but the impl is the same as in hash_utils. HashValue trait is not the same as Hash, afaik. Let me know if the hasher should be done in other way

let v2 = v2.map(OrderedFloat);
v1.partial_cmp(&v2)
}
(Float32(v1), Float32(v2)) => v1.partial_cmp(v2),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(Float32(v1), Float32(v2)) => v1.partial_cmp(v2),
(Float32(v1), Float32(v2)) => v1.total_cmp(v2),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v1 is an Option<f32>, it supports partial_cmp, not total_cmp. let me know if you ok if I unwrap it to floats, the same way as done for Decimals.

Copy link
Contributor

@tustvold tustvold Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we will need to match the option, I keep forgetting that ScalarValue has typed nulls for some reason 😆

partial_cmp on Option, will call through to partial_cmp on f32, which is not the same as total_cmp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Yeah, I checked that Float64(NULL) == Float64(NULL) now.

let v = v.map(OrderedFloat);
v.hash(state)
}
Float32(v) => v.map(Fl).hash(state),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can just call HashValue on v?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v is Option<f32> is supports Hash, but we have to wrap f32 into some wrapper supporting hash. Fl in this case, I didn't find how to implement Hash directly on f32/f64

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, I think there is a way to clean this up, but we can do that in a follow on PR

let v1 = v1.map(OrderedFloat);
let v2 = v2.map(OrderedFloat);
v1.eq(&v2)
// Handle NaN == NaN as true manually like in OrderedFloat
Copy link
Contributor

@tustvold tustvold Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with the hash implementation, this should also use total_cmp. Otherwise two "equal" values according to PartialEq, e.g. +0 and -0, will have different hashes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if

                match (v1, v2) {
                    (Some(f1), Some(f2)) => f1.total_cmp(f2).is_eq(),
                    _ => v1.eq(v2),
                }

.map(f64::from)
.map(|v| OrderedFloat::from(v as f64))
.collect();
let values: Vec<_> = (1..=1_000).map(f64::from).map(|v| v as f64).collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let values: Vec<_> = (1..=1_000).map(f64::from).map(|v| v as f64).collect();
let values: Vec<_> = (1..=1_000).map(f64::from).collect();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for _ in 0..400_000 {
values.push(OrderedFloat::from(1_000_000_f64));
}
let mut values: Vec<_> = (1..=600_000).map(f64::from).map(|v| v as f64).collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut values: Vec<_> = (1..=600_000).map(f64::from).map(|v| v as f64).collect();
let mut values: Vec<_> = (1..=600_000).map(f64::from).collect();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

.map(f64::from)
.map(|v| OrderedFloat::from(v as f64))
.collect();
let values: Vec<_> = (1..=1_000_000).map(f64::from).map(|v| v as f64).collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let values: Vec<_> = (1..=1_000_000).map(f64::from).map(|v| v as f64).collect();
let values: Vec<_> = (1..=1_000_000).map(f64::from).collect();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@comphead comphead marked this pull request as ready for review November 9, 2022 23:47
let v2 = v2.map(OrderedFloat);
v1.partial_cmp(&v2)
}
(Float32(Some(f1)), Float32(Some(f2))) => Some(f1.total_cmp(f2)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will now return None when comparing nulls, which isn't consistent with the other types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! Fixed.

@github-actions github-actions bot added the core Core DataFusion crate label Nov 10, 2022
@comphead comphead requested a review from tustvold November 10, 2022 18:23
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 👍

@tustvold tustvold merged commit 5883e43 into apache:master Nov 10, 2022
@ursabot
Copy link

ursabot commented Nov 10, 2022

Benchmark runs are scheduled for baseline = 509c82c and contender = 5883e43. 5883e43 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use f64::total_cmp instead of OrderedFloat
3 participants