-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use f64::total_cmp instead of OrderedFloat #4133
Conversation
@tustvold I have replaced almost all entries OrderedFloat to f64. Still thinking how to use you hasher to remove OrderedFloat from Hash. |
@comphead I would recommend creating a newtype wrapper around a float that implements Hash using hash_utils, eq using total_cmp, etc... |
Hi @tustvold I have implemented hasher through |
datafusion/common/src/scalar.rs
Outdated
let v2 = v2.map(OrderedFloat); | ||
v1.partial_cmp(&v2) | ||
} | ||
(Float32(v1), Float32(v2)) => v1.partial_cmp(v2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Float32(v1), Float32(v2)) => v1.partial_cmp(v2), | |
(Float32(v1), Float32(v2)) => v1.total_cmp(v2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v1 is an Option<f32>
, it supports partial_cmp, not total_cmp. let me know if you ok if I unwrap it to floats, the same way as done for Decimals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we will need to match the option, I keep forgetting that ScalarValue has typed nulls for some reason 😆
partial_cmp on Option, will call through to partial_cmp on f32, which is not the same as total_cmp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Yeah, I checked that Float64(NULL) == Float64(NULL) now.
let v = v.map(OrderedFloat); | ||
v.hash(state) | ||
} | ||
Float32(v) => v.map(Fl).hash(state), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can just call HashValue on v?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v is Option<f32>
is supports Hash, but we have to wrap f32 into some wrapper supporting hash. Fl
in this case, I didn't find how to implement Hash
directly on f32/f64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, I think there is a way to clean this up, but we can do that in a follow on PR
datafusion/common/src/scalar.rs
Outdated
let v1 = v1.map(OrderedFloat); | ||
let v2 = v2.map(OrderedFloat); | ||
v1.eq(&v2) | ||
// Handle NaN == NaN as true manually like in OrderedFloat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent with the hash implementation, this should also use total_cmp. Otherwise two "equal" values according to PartialEq, e.g. +0 and -0, will have different hashes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if
match (v1, v2) {
(Some(f1), Some(f2)) => f1.total_cmp(f2).is_eq(),
_ => v1.eq(v2),
}
.map(f64::from) | ||
.map(|v| OrderedFloat::from(v as f64)) | ||
.collect(); | ||
let values: Vec<_> = (1..=1_000).map(f64::from).map(|v| v as f64).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let values: Vec<_> = (1..=1_000).map(f64::from).map(|v| v as f64).collect(); | |
let values: Vec<_> = (1..=1_000).map(f64::from).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
for _ in 0..400_000 { | ||
values.push(OrderedFloat::from(1_000_000_f64)); | ||
} | ||
let mut values: Vec<_> = (1..=600_000).map(f64::from).map(|v| v as f64).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut values: Vec<_> = (1..=600_000).map(f64::from).map(|v| v as f64).collect(); | |
let mut values: Vec<_> = (1..=600_000).map(f64::from).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
.map(f64::from) | ||
.map(|v| OrderedFloat::from(v as f64)) | ||
.collect(); | ||
let values: Vec<_> = (1..=1_000_000).map(f64::from).map(|v| v as f64).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let values: Vec<_> = (1..=1_000_000).map(f64::from).map(|v| v as f64).collect(); | |
let values: Vec<_> = (1..=1_000_000).map(f64::from).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
datafusion/common/src/scalar.rs
Outdated
let v2 = v2.map(OrderedFloat); | ||
v1.partial_cmp(&v2) | ||
} | ||
(Float32(Some(f1)), Float32(Some(f2))) => Some(f1.total_cmp(f2)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will now return None when comparing nulls, which isn't consistent with the other types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right! Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you 👍
Benchmark runs are scheduled for baseline = 509c82c and contender = 5883e43. 5883e43 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #4051 .
Rationale for this change
See #4051
What changes are included in this PR?
Are there any user-facing changes?