-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specialize equality for [T] and comparison for [u8] to use memcmp when possible #32699
Conversation
r? @brson (rust_highfive has picked a reviewer for you, use r? to override) |
Previous specialization PR: #32586 |
Could we avoid the addition of a new |
I have a completely unsubstantiated fear that messing with eq_slice will result in ICEs. We can see what happens if we try. |
|
||
/// Trait implemented for types that can be compared for equality using | ||
/// their bytewise representation | ||
trait BytewiseEquality { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn’t we be able to implement this ∀T: Copy
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example f32 and &i32 are two copy types that don't compare for equality byte by byte
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this would be helpful for fast comparisons/hashing.
Where T is a type that can be compared for equality bytewise, we can use memcmp. We can also use memcmp for PartialOrd, Ord for [u8] and by extension &str. This is an improvement for example for the comparison [u8] == [u8] that used to emit a loop that compared the slices byte by byte. One worry here could be that this introduces function calls to memcmp in contexts where it should really inline the comparison or even optimize it out, but llvm takes care of recognizing memcmp specifically.
The old test for Ord used no asserts, and appeared to have a wrong test. (!).
I have updated (rebased) the branch. Moving memcmp into slice.rs required specializing PartialOrd, Ord too, so that &str could use that. I'll let travis run the full testsuite on this. |
} | ||
} | ||
|
||
impl_marker_for!(BytewiseEquality, | ||
u8 i8 u16 i16 u32 i32 u64 i64 usize isize char bool); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is room to eek out more performance here if we specialize for [u16]
, [u32]
and etc. I did some experimentation with trying to speed up memcmp a year or so ago, and most implementations just check a byte at a time until the slices align on a usize. For these larger types, we could just skip some of these alignment checks since we already know they're aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! That sounds like exactly the motivation needed to make memcmp(*const u8, *const u8, usize)
into an llvm intrinsic, so that it uses the alignment information from the pointer (like memcpy already does). http://llvm.org/docs/LangRef.html#llvm-memcpy-intrinsic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bluss: Exactly. There was a few questions a number of years ago (1, 2) to add one, but it didn't get any traction.
PS: I think I found when I was talking about this on #rust-internals with you, @bluss, back in 2015 :) I can't say that this would be a big win, it might just shave off a few conditionals, which may or may not really matter in real code. I also found my old benchmarks, which I've uploaded to https://github.com/erickt/rust-memcmp-benches.
⌛ Testing commit 28c4d12 with merge 523ae98... |
|
⛄ The build was interrupted to prioritize another pull request. |
@bors r- (feel free to re-r when the bug is fixed) |
oh, thanks Manishearth. I guess we need doc(hidden) on the traits, even if they are private? |
This should avoid the trait impls showing up in rustdoc.
ok, all other private traits seem to use doc(hidden). resubmitting with that fixed. @bors r=alexcrichton |
📌 Commit a6c27be has been approved by |
Specialize equality for [T] and comparison for [u8] to use memcmp when possible Specialize equality for [T] and comparison for [u8] to use memcmp when possible Where T is a type that can be compared for equality bytewise, we can use memcmp. We can also use memcmp for PartialOrd, Ord for [u8]. Use specialization to call memcmp in PartialEq for slices for certain element types. This PR does not change the user visible API since the implementation uses an intermediate trait. See commit messages for more information. The memcmp signature was changed from `*const i8` to `*const u8` which is in line with how the memcmp function is defined in C (taking const void * arguments, interpreting the values as unsigned bytes for purposes of the comparison).
Specialize equality for [T] and comparison for [u8] to use memcmp when possible Specialize equality for [T] and comparison for [u8] to use memcmp when possible Where T is a type that can be compared for equality bytewise, we can use memcmp. We can also use memcmp for PartialOrd, Ord for [u8]. Use specialization to call memcmp in PartialEq for slices for certain element types. This PR does not change the user visible API since the implementation uses an intermediate trait. See commit messages for more information. The memcmp signature was changed from `*const i8` to `*const u8` which is in line with how the memcmp function is defined in C (taking const void * arguments, interpreting the values as unsigned bytes for purposes of the comparison).
core: check for pointer equality when comparing Eq slices Because `Eq` types must be reflexively equal, an equal-length slice to the same memory location must be equal. This is related to rust-lang#33892 (and rust-lang#32699) answering this comment from that PR: > Great! One more easy question: why does this optimization not apply in the non-BytewiseEquality implementation directly above? Because slices of non-reflexively equal types (like `f64`) are not equal even if it's the same slice. But if the types are `Eq`, we can use this same-address optimization, which this PR implements. Obviously this changes behavior if types violate the reflexivity condition of `Eq`, because their impls of `PartialEq` will no longer be called per-item, but 🤷♂ . It's not clear how often this optimization comes up in the real world outside of the same-`&str` case covered by rust-lang#33892, so **I'm requesting a perf run** (on MacOS today, so can't run `rustc_perf` myself). I'm going ahead and making the PR on the basis of being surprised things didn't already work this way. This is my first time hacking rust itself, so as a perf sanity check I ran `./x.py bench --stage 0 src/lib{std,alloc}`, but the differences were noisy. To make the existing specialization for `BytewiseEquality` explicit, it's now a supertrait of `Eq + Copy`. `Eq` should be sufficient, but `Copy` was included for clarity.
Specialize equality for [T] and comparison for [u8] to use memcmp when possible
Where T is a type that can be compared for equality bytewise, we can use
memcmp. We can also use memcmp for PartialOrd, Ord for [u8].
Use specialization to call memcmp in PartialEq for slices for certain element types. This PR does not change the user visible API since the implementation uses an intermediate trait. See commit messages for more information.
The memcmp signature was changed from
*const i8
to*const u8
which is in line with how the memcmp function is defined in C (taking const void * arguments, interpreting the values as unsigned bytes for purposes of the comparison).