Use f64::total_cmp instead of OrderedFloat #4133

comphead · 2022-11-07T23:06:17Z

Which issue does this PR close?

Closes #4051 .

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

comphead · 2022-11-07T23:07:46Z

@tustvold I have replaced almost all entries OrderedFloat to f64. Still thinking how to use you hasher to remove OrderedFloat from Hash.
As your trait implement HashValue and ScalarValue requires std::cmp::Hash

tustvold · 2022-11-08T01:22:23Z

@comphead I would recommend creating a newtype wrapper around a float that implements Hash using hash_utils, eq using total_cmp, etc...

comphead · 2022-11-09T01:13:38Z

@comphead I would recommend creating a newtype wrapper around a float that implements Hash using hash_utils, eq using total_cmp, etc...

Hi @tustvold I have implemented hasher through std::hash but the impl is the same as in hash_utils. HashValue trait is not the same as Hash, afaik. Let me know if the hasher should be done in other way

tustvold · 2022-11-09T05:17:10Z

datafusion/common/src/scalar.rs

-                let v2 = v2.map(OrderedFloat);
-                v1.partial_cmp(&v2)
-            }
+            (Float32(v1), Float32(v2)) => v1.partial_cmp(v2),


Suggested change

(Float32(v1), Float32(v2)) => v1.partial_cmp(v2),

(Float32(v1), Float32(v2)) => v1.total_cmp(v2),

v1 is an Option<f32>, it supports partial_cmp, not total_cmp. let me know if you ok if I unwrap it to floats, the same way as done for Decimals.

Yes, we will need to match the option, I keep forgetting that ScalarValue has typed nulls for some reason 😆

partial_cmp on Option, will call through to partial_cmp on f32, which is not the same as total_cmp

Done. Yeah, I checked that Float64(NULL) == Float64(NULL) now.

tustvold · 2022-11-09T05:18:45Z

datafusion/common/src/scalar.rs

-                let v = v.map(OrderedFloat);
-                v.hash(state)
-            }
+            Float32(v) => v.map(Fl).hash(state),


I think this can just call HashValue on v?

v is Option<f32> is supports Hash, but we have to wrap f32 into some wrapper supporting hash. Fl in this case, I didn't find how to implement Hash directly on f32/f64

Fair, I think there is a way to clean this up, but we can do that in a follow on PR

tustvold · 2022-11-09T05:19:31Z

datafusion/common/src/scalar.rs

-                let v1 = v1.map(OrderedFloat);
-                let v2 = v2.map(OrderedFloat);
-                v1.eq(&v2)
+                // Handle NaN == NaN as true manually like in OrderedFloat


To be consistent with the hash implementation, this should also use total_cmp. Otherwise two "equal" values according to PartialEq, e.g. +0 and -0, will have different hashes

What if

match (v1, v2) { (Some(f1), Some(f2)) => f1.total_cmp(f2).is_eq(), _ => v1.eq(v2), }

tustvold · 2022-11-09T05:20:43Z

datafusion/physical-expr/src/aggregate/tdigest.rs

-                .map(f64::from)
-                .map(|v| OrderedFloat::from(v as f64))
-                .collect();
+            let values: Vec<_> = (1..=1_000).map(f64::from).map(|v| v as f64).collect();


Suggested change

let values: Vec<_> = (1..=1_000).map(f64::from).map(|v| v as f64).collect();

let values: Vec<_> = (1..=1_000).map(f64::from).collect();

tustvold · 2022-11-09T05:20:53Z

datafusion/physical-expr/src/aggregate/tdigest.rs

-        for _ in 0..400_000 {
-            values.push(OrderedFloat::from(1_000_000_f64));
-        }
+        let mut values: Vec<_> = (1..=600_000).map(f64::from).map(|v| v as f64).collect();


Suggested change

let mut values: Vec<_> = (1..=600_000).map(f64::from).map(|v| v as f64).collect();

let mut values: Vec<_> = (1..=600_000).map(f64::from).collect();

tustvold · 2022-11-09T05:21:01Z

datafusion/physical-expr/src/aggregate/tdigest.rs

-            .map(f64::from)
-            .map(|v| OrderedFloat::from(v as f64))
-            .collect();
+        let values: Vec<_> = (1..=1_000_000).map(f64::from).map(|v| v as f64).collect();


Suggested change

let values: Vec<_> = (1..=1_000_000).map(f64::from).map(|v| v as f64).collect();

let values: Vec<_> = (1..=1_000_000).map(f64::from).collect();

datafusion/physical-expr/src/aggregate/tdigest.rs

datafusion/common/src/scalar.rs

tustvold · 2022-11-10T01:11:42Z

datafusion/common/src/scalar.rs

-                let v2 = v2.map(OrderedFloat);
-                v1.partial_cmp(&v2)
-            }
+            (Float32(Some(f1)), Float32(Some(f2))) => Some(f1.total_cmp(f2)),


I think this will now return None when comparing nulls, which isn't consistent with the other types

Right! Fixed.

tustvold

Thank you 👍

ursabot · 2022-11-10T19:43:03Z

Benchmark runs are scheduled for baseline = 509c82c and contender = 5883e43. 5883e43 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

comphead added 2 commits November 7, 2022 14:34

Replace OrderedFloat with f64

a8ae8bd

clippy

387e904

github-actions bot added the physical-expr Physical Expressions label Nov 7, 2022

Adding hasher

d98e66f

tustvold reviewed Nov 9, 2022

View reviewed changes

datafusion/physical-expr/src/aggregate/tdigest.rs Outdated Show resolved Hide resolved

tustvold reviewed Nov 9, 2022

View reviewed changes

datafusion/common/src/scalar.rs Outdated Show resolved Hide resolved

tustvold reviewed Nov 9, 2022

View reviewed changes

datafusion/common/src/scalar.rs Outdated Show resolved Hide resolved

fixed comments

4362c52

tustvold mentioned this pull request Nov 9, 2022

Add compare to ArrowNativeTypeOp apache/arrow-rs#3070

Merged

comphead added 2 commits November 9, 2022 15:31

fixed comments

9060099

fmt

3fef43b

comphead marked this pull request as ready for review November 9, 2022 23:47

tustvold reviewed Nov 10, 2022

View reviewed changes

comphead added 2 commits November 9, 2022 21:08

comments fixed

0bab5f4

removed ordered_flost from toml

d9872a0

github-actions bot added the core Core DataFusion crate label Nov 10, 2022

changed cargo.lock

cc9cd52

comphead requested a review from tustvold November 10, 2022 18:23

tustvold approved these changes Nov 10, 2022

View reviewed changes

tustvold merged commit 5883e43 into apache:master Nov 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use f64::total_cmp instead of OrderedFloat #4133

Use f64::total_cmp instead of OrderedFloat #4133

comphead commented Nov 7, 2022

comphead commented Nov 7, 2022

tustvold commented Nov 8, 2022 •

edited

Loading

comphead commented Nov 9, 2022

tustvold Nov 9, 2022

comphead Nov 9, 2022

tustvold Nov 9, 2022 •

edited

Loading

comphead Nov 9, 2022

tustvold Nov 9, 2022

comphead Nov 9, 2022

tustvold Nov 9, 2022

tustvold Nov 9, 2022 •

edited

Loading

comphead Nov 9, 2022

tustvold Nov 9, 2022

comphead Nov 9, 2022

tustvold Nov 9, 2022

comphead Nov 9, 2022

tustvold Nov 9, 2022

comphead Nov 9, 2022

tustvold Nov 10, 2022

comphead Nov 10, 2022

tustvold left a comment

ursabot commented Nov 10, 2022

	(Float32(v1), Float32(v2)) => v1.partial_cmp(v2),
	(Float32(v1), Float32(v2)) => v1.total_cmp(v2),

	let values: Vec<_> = (1..=1_000).map(f64::from).map(\|v\| v as f64).collect();
	let values: Vec<_> = (1..=1_000).map(f64::from).collect();

	let mut values: Vec<_> = (1..=600_000).map(f64::from).map(\|v\| v as f64).collect();
	let mut values: Vec<_> = (1..=600_000).map(f64::from).collect();

	let values: Vec<_> = (1..=1_000_000).map(f64::from).map(\|v\| v as f64).collect();
	let values: Vec<_> = (1..=1_000_000).map(f64::from).collect();

Use f64::total_cmp instead of OrderedFloat #4133

Use f64::total_cmp instead of OrderedFloat #4133

Conversation

comphead commented Nov 7, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

comphead commented Nov 7, 2022

tustvold commented Nov 8, 2022 • edited Loading

comphead commented Nov 9, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold Nov 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold Nov 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold left a comment

Choose a reason for hiding this comment

ursabot commented Nov 10, 2022

tustvold commented Nov 8, 2022 •

edited

Loading

tustvold Nov 9, 2022 •

edited

Loading

tustvold Nov 9, 2022 •

edited

Loading