Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't use read_unaligned to read from an array casted to *const Simd #341

Open
programmerjake opened this issue Apr 23, 2023 · 1 comment

Comments

@programmerjake
Copy link
Member

programmerjake commented Apr 23, 2023

We should create a simd_read_array intrinsic because for sizeof(Simd<T, N>) > sizeof([T; N]) (which can happen until #319 is fixed) read_unaligned is probably UB due to being able to read the bytes beyond the end of the input array -- the padding in the Simd<T, N>.

We need an intrinsic rather than just using memcpy because the intrinsic will generate llvm's load instruction with vector type (llvm guarantees vector load won't read padding if the load's align is small enough), whereas memcpy may end up using less efficient array-typed loads which sometimes use scalar code.

https://rust-lang.zulipchat.com/#narrow/stream/257879-project-portable-simd/topic/splat.20no.20longer.20compiles.20for.20release.20builds/near/352101044

@scottmcm
Copy link
Member

I think that with rust-lang/rust#111999 it might be fine to do this as an array copy? This:

const N: usize = 16;
pub fn simd_from_array(a: [u32; N]) -> Simd<u32, N> {
    let mut v = Simd::default();
    *v.as_mut_array() = a;
    v
}

Generates a vector load/store now: https://rust.godbolt.org/z/hYv4T17TP

define void @_ZN7example15simd_from_array17h5fc2848bbb0cc1d0E(ptr noalias nocapture noundef writeonly sret(<16 x i32>) align 64 dereferenceable(64) %_0, ptr noalias nocapture noundef readonly align 4 dereferenceable(64) %a) unnamed_addr #0 {
  %v.0.copyload = load <16 x i32>, ptr %a, align 4
  store <16 x i32> %v.0.copyload, ptr %_0, align 64
  ret void
}

(Though it'll fallback to a safe memcpy for non-pot sizes because rust-lang/rust#115236 found that non-pot vector loads & stores go very poorly on many platforms.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants