-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow transmute
s to produce OperandValue
s instead of needing alloca
s
#109843
Conversation
(rustbot has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
c49761b
to
1e509af
Compare
Looks like I missed some ScalarPair cases |
This comment has been minimized.
This comment has been minimized.
1e509af
to
1aca038
Compare
Switched to https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/trait.BuilderMethods.html#tymethod.load_operand which handles all the cases and is better than what I was doing before anyway |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not exactly sure what's happening here, but here are some initial thoughts
1aca038
to
c3d1b2a
Compare
/// For nearly all types this is the same as the [`backend_type`], however | ||
/// `bool` (and other `0`-or-`1` values) are kept as [`BaseTypeMethods::type_i1`] | ||
/// in registers but as [`BaseTypeMethods::type_i8`] in memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your mention of uncertainty was a good prompt to write up some of what I learned as I was stumbling around trying to figure out how all this stuff worked 🙂
c3d1b2a
to
7afaa7c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me. r=me modulo the ZST question. (thanks for expanding docs!)
let size_in_bytes = src.layout.size.bytes(); | ||
if size_in_bytes == 0 { | ||
// Nothing to write |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason to remove the ZST check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because, as I dug more, it was really only useful for my manual memcpy
; OperandValue::store
already does it:
rust/compiler/rustc_codegen_ssa/src/mir/operand.rs
Lines 285 to 289 in be8e5ba
// Avoid generating stores of zero-sized values, because the only way to have a zero-sized | |
// value is through `undef`, and store itself is useless. | |
if dest.layout.is_zst() { | |
return; | |
} |
// CHECK: %[[VAL:.+]] = load <4 x float>, {{ptr %x|.+>\* %.+}}, align 4 | ||
// CHECK: store <4 x float> %[[VAL:.+]], {{ptr %0|.+>\* %.+}}, align 16 | ||
unsafe { std::mem::transmute(x) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think it's a problem, but it's interesting how S(x)
produces a memcpy
while transmute produces load+store, while doing identical work (I think?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite! When I saw this I brought it up with the portable-simd folks, since it seems that LLVM doesn't normalize it in IR in either direction, so they might want to experiment with which is better.
@bors r=WaffleLapkin |
📌 Commit 7afaa7ce1861481fdb0af9a6469b5b60e47787c5 has been approved by It is now in the queue for this repository. |
⌛ Testing commit 7afaa7ce1861481fdb0af9a6469b5b60e47787c5 with merge 3913a2b4c2bbaf2dd5c5610ecfc6beea1ababd99... |
This comment has been minimized.
This comment has been minimized.
💔 Test failed - checks-actions |
Of course, 32-bit catches me again 😭 @bors r- |
💔 Test failed - checks-actions |
This comment has been minimized.
This comment has been minimized.
… `alloca`s LLVM can usually optimize these away, but especially for things like transmutes of newtypes it's silly to generate the `alloc`+`store`+`load` at all when it's actually a nop at LLVM level.
462652d
to
9aa9a84
Compare
53ac230
to
9aa9a84
Compare
...and updated the test to not accidentally rely on standard library optimizations. Passed for @bors r=WaffleLapkin rollup=iffy |
☀️ Test successful - checks-actions |
Finished benchmarking commit (8d321f7): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. |
Wow, more transmutes out there than I'd thought -- 73 relevant binary size improvements in debug, with no (even non-relevant) size regressions. (Nothing relevant for |
LLVM can usually optimize these away, but especially for things like transmutes of newtypes it's silly to generate the
alloc
+store
+load
at all when it's actually a nop at LLVM level.