-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missed optimization: _ => 0
generates worse code than 0 => 0, _ => unreachable!()
#118306
Comments
Upstream issue: llvm/llvm-project#73446. @rustbot claim |
@rustbot label llvm-fixed-upstream |
It looks like this is fixed since 1.75, but I don't know what fixed it: https://godbolt.org/z/eGnWbxbG4 |
I don't think it has been fixed. It looks like the function is not submitted: https://godbolt.org/z/YdqWq8hbb. |
You need to stick |
Normally I would do this. Maybe we should mention this somewhere to avoid submitting invalid code to godbolt? |
Oops, sorry. I saw that one function was generated and assumed the other one got merged... |
Seems like this has been reverted :/, look at the generated assembly with |
Can you explain why you think that? I don't see any changes: https://godbolt.org/z/rrb5oKbjb. BTW, I can reland the upstream patch now. |
I think that it's been reverted because the assembly output contains this: faster:
and edi, 3
lea rax, [rip + .Lswitch.table.faster]
mov rax, qword ptr [rax + 8*rdi]
ret
branchy:
and edi, 3
lea rax, [rdi - 1]
cmp rax, 2
ja .LBB1_1
lea rax, [rip + .Lswitch.table.branchy]
mov rax, qword ptr [rax + 8*rdi - 8]
ret The |
Ah, I think you're saying that this optimization is still in a missing state, right? |
Yep, I thought you meant that you implemented the optimization, is it not implemented? 😅 |
Yes, I have implemented it, but due to the compilation time issue mentioned in llvm/llvm-project#78578, I had to revert the commit. Now I have relanded it: llvm/llvm-project#73446 (comment). @rustbot label +llvm-fixed-upstream |
Confirmed fixed by #127513, needs codegen test. |
Add a set of tests for LLVM 19 Close rust-lang#107681. Close rust-lang#118306. Close rust-lang#126585. r? compiler
Add a set of tests for LLVM 19 Close rust-lang#107681. Close rust-lang#118306. Close rust-lang#126585. r? compiler try-job: i686-msvc try-job: arm-android try-job: test-various
Add a set of tests for LLVM 19 Close rust-lang#107681. Close rust-lang#118306. Close rust-lang#126585. r? compiler try-job: i686-msvc try-job: arm-android try-job: test-various
Add a set of tests for LLVM 19 Close rust-lang#107681. Close rust-lang#118306. Close rust-lang#126585. r? compiler try-job: i686-msvc try-job: arm-android try-job: test-various
Add a set of tests for LLVM 19 Close rust-lang#107681. Close rust-lang#118306. Close rust-lang#126585. r? compiler try-job: i686-msvc try-job: arm-android try-job: test-various
Add a set of tests for LLVM 19 Close rust-lang#107681. Close rust-lang#118306. Close rust-lang#126585. r? compiler
Add a set of tests for LLVM 19 Close rust-lang#107681. Close rust-lang#118306. Close rust-lang#126585. r? compiler
Consider the following functions (https://godbolt.org/z/a8r3Tc7TE):
These functions have identical behavior: they map
input
toinput % 4 - (input % 4 / 2)
. In the former case, LLVM generates a nice lookup table for us, but in the latter, it emits an extra branch. The only difference is that I've used_ => ...
to avoid needing to write an unreachable-by-optimization branch.If we look at the generated IR (after
-Cpasses=strip,mem2reg,simplifycfg
):The problem is clear: LLVM does not seem to realize that it can trivially transform
branchy
tofaster
here, by observing that the default in theswitch
is only taken when%2 == 0
.I suspect this is more LLVM bug than Rust bug, but it feels fixable by a MIR peephole optimization? Unclear. The
_ => 0
code I wrote is an attractive nuisance that I imagine other people writing, too, so perhaps there is value to seeing if this optimization can be made before LLVM.This bug is also present in Clang, in case someone wants to file an LLVM bug: https://godbolt.org/z/x7rec97E7. It's unclear to me if this is the sort of optimization Clang would do in the frontend instead of in LLVM; could go either-or here, tbh.
The text was updated successfully, but these errors were encountered: