You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should create a simd_read_array intrinsic because for sizeof(Simd<T, N>) > sizeof([T; N]) (which can happen until #319 is fixed) read_unaligned is probably UB due to being able to read the bytes beyond the end of the input array -- the padding in the Simd<T, N>.
We need an intrinsic rather than just using memcpy because the intrinsic will generate llvm's load instruction with vector type (llvm guarantees vector load won't read padding if the load's align is small enough), whereas memcpy may end up using less efficient array-typed loads which sometimes use scalar code.
definevoid@_ZN7example15simd_from_array17h5fc2848bbb0cc1d0E(ptrnoaliasnocapture noundef writeonlysret(<16 x i32>) align64dereferenceable(64) %_0, ptrnoaliasnocapture noundef readonlyalign4dereferenceable(64) %a) unnamed_addr #0 {
%v.0.copyload = load <16 x i32>, ptr%a, align4store <16 x i32> %v.0.copyload, ptr%_0, align64retvoid
}
(Though it'll fallback to a safe memcpy for non-pot sizes because rust-lang/rust#115236 found that non-pot vector loads & stores go very poorly on many platforms.)
We should create a
simd_read_array
intrinsic because forsizeof(Simd<T, N>) > sizeof([T; N])
(which can happen until #319 is fixed)read_unaligned
is probably UB due to being able to read the bytes beyond the end of the input array -- the padding in theSimd<T, N>
.We need an intrinsic rather than just using
memcpy
because the intrinsic will generate llvm'sload
instruction with vector type (llvm guarantees vectorload
won't read padding if theload
'salign
is small enough), whereasmemcpy
may end up using less efficient array-typed loads which sometimes use scalar code.https://rust-lang.zulipchat.com/#narrow/stream/257879-project-portable-simd/topic/splat.20no.20longer.20compiles.20for.20release.20builds/near/352101044
The text was updated successfully, but these errors were encountered: