-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change slice::to_vec to not use extend_from_slice #79186
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
3297db7
to
e09b5a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extend_from_slice was previously (presumably) optimized to e.g. perform a single memcpy via specialization on T: Copy, is that property preserved here? Can we add a codegen test (src/test/codegen, see https://www.llvm.org/docs/CommandGuide/FileCheck.html for docs on syntax or examples in that dir) to show that we get expected behavior here perhaps (for the T: Copy case at least).
While there's no codegen test yet for that property, from the godbolt link which uses a It might take a bit to add a test since compiling on my machine is slow |
e09b5a0
to
97ca104
Compare
FWIW the &str -> String case I was able to better optimize (edit: at least at a "asm looks better" level) than what this PR produces with https://rust.godbolt.org/z/WP7WdK, but I'm not sure if we can e.g. put that directly in a From impl. This PR may still make sense even without that. @m-ou-se -- did you perhaps have context here beyond the Zulip conversation? Trying to figure out if there's other cases besides &str -> String that we care about here. |
Ah I think that what you have is certainly better for the case of |
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 97ca104b566e8b60191bd40c1103694f1e8cba4f with merge c163a6ecb293f84742f6802fe80a1b745c3e1b4c... |
☀️ Try build successful - checks-actions |
Queued c163a6ecb293f84742f6802fe80a1b745c3e1b4c with parent 8256379, future comparison URL. |
Finished benchmarking try commit (c163a6ecb293f84742f6802fe80a1b745c3e1b4c): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Interesting, mixed with mostly good results, I've also added an additional specialization since it appears that that on top the new version should hopefully be better, as from prior comments the reason it slowed down was because of missing |
bd79187
to
7e9f777
Compare
It might be a good idea to have another perf run to see if the additional changes helped or regressed anything. |
- let mut vec = Vec::with_capacity(s.len());
+ let mut vec = Vec::new();
vec.extend_from_slice(s);
vec |
Hm, I agree that that is much better than the implementation provided with the drop guard, but I think Simulacrum's impl has fewer instructions when it comes to string types. I also looked at a comparison to the implementation provided with larger types, and it appears just removing https://rust.godbolt.org/z/xP9Wrr |
More assembly instructions doesn't have to be a bad thing. It's optimizing for speed mostly. It looks like it just unrolled the copy loop better in the |
Mm that makes sense. I wonder if I could get the drop guard version to unroll some, because it seems to do less work outside of the loop, so taking the best of both worlds |
Vec's The specialization: Lines 2287 to 2299 in fe98231
which ultimately should (through some unnecessary indirections) delegate to Lines 1266 to 1274 in fe98231
|
I would think |
Queued da403f94eb6c20c344e2e0a7fb61f0a2c940a930 with parent 20328b5, future comparison URL. |
Finished benchmarking try commit (da403f94eb6c20c344e2e0a7fb61f0a2c940a930): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Thank you! I should've rebased more carefully to keep it clear which was which. |
Please also update the description of this PR, and feel free to keep commits squashed. I'm not sure about the numerous perf runs here either, if there's some comparison being done or whatever I didn't catch that :) The last perf run looks pretty excellent though! |
e5ff958
to
e7dd9e9
Compare
This also required adding a loop guard in case clone panics Add specialization for copy There is a better version for copy, so I've added specialization for that function and hopefully that should speed it up even more. Switch FromIter<slice::Iter> to use `to_vec` Test different unrolling version for to_vec Revert to impl From benchmarking, it appears this version is faster
e7dd9e9
to
a991558
Compare
@bors try @rust-timer queue Let's double check there are no regressions from the Copy/Clone switch. |
Awaiting bors try build completion |
⌛ Trying commit a991558 with merge 52c91095e4d4144807832f6fb658bea83f707c50... |
☀️ Try build successful - checks-actions |
Queued 52c91095e4d4144807832f6fb658bea83f707c50 with parent a0d664b, future comparison URL. |
Finished benchmarking try commit (52c91095e4d4144807832f6fb658bea83f707c50): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
@bors r+ Ok, latest perf run looks good to me. |
📌 Commit a991558 has been approved by |
☀️ Test successful - checks-actions |
I saw this Zulip thread, and didn't see any update from it, so I thought I'd try to fix it. This converts
to_vec
to no longer useextend_from_slice
, but relies on knowing that the allocated capacity is the same size as the input.Godbolt new v1
Godbolt new v2 w/ drop guard
Godbolt old version
After some amount of iteration, there are now two specializations for
to_vec
, one forCopy
types that use memcpy, and one for clone types which is the original from this PR.This is then used inside of
impl<T: Clone> FromIterator<Iter::Slice<T>> for Vec<T>
which is essentially equivalent to&[T] -> Vec<T>
, instead of previous specialization of theextend
function. This is because extend has to reason more about existing capacity by callingreserve
on an existing vec, and thus produces worse asm.Downsides: This allocates the exact capacity, so I think if many items are added to this
Vec
after, it might need to allocate whereas extending may not. I also noticed the number of faults went up in the benchmarks, but not sure where from exactly.