Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm32-unknown-unknown doesn't use simd instructions (at least, for one particular example, but plausibly more) #320

Open
thomcc opened this issue Dec 3, 2022 · 3 comments

Comments

@thomcc
Copy link
Member

thomcc commented Dec 3, 2022

Godbolt link: https://godbolt.org/z/KEsaM8sbh.

This function is fully scalarized to bytes under --target=wasm32-unknown-unknown -Ctarget-features=+simd128:

pub fn all_ascii_simd_std(s: &[u8; 32]) -> bool {
    use core::simd::*;
    const ALL_HI: u8x32 = u8x32::from_array([0x80; 32]);
    const ZERO: u8x32 = u8x32::from_array([0; 32]);
    (u8x32::from_array(*s) & ALL_HI).simd_eq(ZERO).all()
}

I tried:

  • Adding #[target_feature(enable = "simd128")] to it, which does not help.
  • Rewriting it to 2 vectors of native width (e.g. [u8x16; 2]), which also does not help (this is also included in the example).
  • Changing it in a couple other ways.

And none of it helped.

This is unfortunate because WebAssembly's SIMD should be pretty efficient here -- u8x16_bitmask already only looks at the high bit, so it can avoid masking with u8x16_splat(0x80). Ideally, I'd get the same thing as:

#[target_feature(enable = "simd128")]
pub fn all_ascii_simd_wasm(s: &[u8; 32]) -> bool {
    use core::arch::wasm32::*;
    let a = unsafe { v128_load(s.as_ptr().cast()) };
    let b = unsafe { v128_load(s.as_ptr().add(16).cast()) };
    u8x16_bitmask(v128_or(a, b)) == 0
}

Or you know, something vectorized? Or even a fallback to usize SWAR? Sadly, we get nothing like that.

It is concerning that the function it does load the vectors, but then goes out of its way to convert them to bytes that the rest of the operations are performed on. This implies to me that none of the operations here are supported, meaning that core::simd doesn't use simd128 on wasm targets at all... Which might mean it's an easy fix (flip a bool somewhere?) or something to report upstream.

@thomcc
Copy link
Member Author

thomcc commented Dec 3, 2022

FWIW this function optimizes like garbage on everything that isn't x86{,_64}+sse2 and aarch64+neon. Sadly, even the other ISAs that should support this seem to end up scalar too, just... not in a way that implies to me that something might be broken on our end.

This may be related, or it may not.

@programmerjake
Copy link
Member

it generates more efficient (but still not perfect) code if you change it to:

pub fn all_ascii_simd_std(s: &[u8; 32]) -> bool {
    use std::simd::*;
    (u8x32::from_slice(s) & Simd::splat(0x80)).reduce_or() == 0
}

@calebzulawski
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants