-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nearly 50% performance regression when copying/moving small byte arrays between 1.71 and 1.72 #115212
Comments
Here's a demo for |
Oof, that's massively worse. But even with a nice size ( We should probably revert #111999, or limit it to very small arrays (<= 8 bytes / 1 usize? https://godbolt.org/z/6sWdYaeEx) |
The original motivation of that PR is for arrays larger than that, so I'm not so sure. If I strip this down to just the pointer write it's not as bad, but it's quite clear that the problem is just x86-isel: https://godbolt.org/z/qh89c85P8 Is that even good instruction selection? |
Hmm, curiously looking at just the part of the push logic that copies the value, the assembly is entirely reasonable: pub fn push(v: &mut Vec<[u8; 24]>, x: [u8; 24]) {
if v.len() < v.capacity() { v.push(x); }
} https://rust.godbolt.org/z/45fEo314o example::push:
mov rax, qword ptr [rdi + 16]
cmp rax, qword ptr [rdi + 8]
jae .LBB0_2
movups xmm0, xmmword ptr [rsi]
mov rcx, qword ptr [rsi + 16]
mov rdx, qword ptr [rdi]
lea rsi, [rax + 2*rax]
mov qword ptr [rdx + 8*rsi + 16], rcx
movups xmmword ptr [rdx + 8*rsi], xmm0
inc rax
mov qword ptr [rdi + 16], rax
.LBB0_2:
ret But then in the context of a "real" push, it somehow still blows up: https://rust.godbolt.org/z/zW34r7h1j |
Here's a small tweak to try to fix this for 1.73: #115236 |
I've opened #115242 to track the suboptimal @tpelkone Once a new nightly is out, please give it a shot and reopen if you're still seeing the regression. |
On Ubuntu VM running on Ryzen 5950X this is far worse:
2.4x regression. |
@scottmcm Things look much better with |
@tpelkone Thanks for confirming! I'll see if it can make 1.73. |
Summary
Nearly 50% performance regression between versions 1.71 and 1.72 when copying small byte arrays with
stable-aarch64-apple-darwin
platform (Macbook Pro M1). Usedcargo bisect-rustc
to track down the regression to fd9bf59 #111999Code
I tried this code:
I expected to see this happen: to run about 6,200,000 ns/iter like version 1.71
Instead, this happened: 9,159,412 ns/iter
Version it worked on
It most recently worked on: 1.71
More specifically nightly-2023-06-06-aarch64-apple-darwin
Version with regression
1.72
nightly-2023-06-07-aarch64-apple-darwin
rustc --version --verbose
:cargo bisect-rustc
Searched nightlies: from nightly-2023-06-06 to nightly-2023-06-07
regressed nightly: nightly-2023-06-07
searched commit range: e6d4725...b2b34bd
regressed commit: fd9bf59
bisected with cargo-bisect-rustc v0.6.7
Host triple: x86_64-apple-darwin
Reproduce with:
@rustbot modify labels: +regression-from-stable-to-stable -regression-untriaged
The text was updated successfully, but these errors were encountered: