Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use simd-lite #39

Merged
merged 8 commits into from
Sep 3, 2019
Merged

Use simd-lite #39

merged 8 commits into from
Sep 3, 2019

Conversation

Licenser
Copy link
Member

Use simd-lite for ARM support.

@sunnygleason
Copy link
Member

@Licenser it's awesome that this works! The developer opinions aren't a deal-breaker, I think we can fix that in future revisions anyway.

src/neon/generator.rs Outdated Show resolved Hide resolved
src/neon/stage1.rs Outdated Show resolved Hide resolved
@sunnygleason
Copy link
Member

@Licenser was just looking at the test failures, I think you might want to use this impl of vtstq. I did an integration earlier against your intrinsics work, and this is what I needed (which is why I was complaining about the private fields):


#[inline]
fn test_u8(a: u8, b: u8) -> u8 {
    if a & b != 0 {
        0xFF
    } else {
        0x00
    }
}

#[inline]
pub unsafe fn vtstq_u8(a: uint8x16_t, b: uint8x16_t) -> uint8x16_t {
    uint8x16_t(
        test_u8(a.0, b.0),
        test_u8(a.1, b.1),
        test_u8(a.2, b.2),
        test_u8(a.3, b.3),
        test_u8(a.4, b.4),
        test_u8(a.5, b.5),
        test_u8(a.6, b.6),
        test_u8(a.7, b.7),
        test_u8(a.8, b.8),
        test_u8(a.9, b.9),
        test_u8(a.10, b.10),
        test_u8(a.11, b.11),
        test_u8(a.12, b.12),
        test_u8(a.13, b.13),
        test_u8(a.14, b.14),
        test_u8(a.15, b.15),
    )
}

#[inline]
fn test_s8(a: i8, b: i8) -> i8 {
    if a & b != 0 {
        -1
    } else {
        0x00
    }
}

#[inline]
pub unsafe fn vtstq_s8(a: int8x16_t, b: int8x16_t) -> int8x16_t {
    int8x16_t(
        test_s8(a.0, b.0),
        test_s8(a.1, b.1),
        test_s8(a.2, b.2),
        test_s8(a.3, b.3),
        test_s8(a.4, b.4),
        test_s8(a.5, b.5),
        test_s8(a.6, b.6),
        test_s8(a.7, b.7),
        test_s8(a.8, b.8),
        test_s8(a.9, b.9),
        test_s8(a.10, b.10),
        test_s8(a.11, b.11),
        test_s8(a.12, b.12),
        test_s8(a.13, b.13),
        test_s8(a.14, b.14),
        test_s8(a.15, b.15),
    )
}

@Licenser
Copy link
Member Author

/// Compare bitwise test bits nonzero
#[inline]
#[target_feature(enable = "neon")]
#[cfg_attr(target_arch = "arm", target_feature(enable = "v7"))]
#[cfg_attr(test, assert_instr(cmtst))]
// even gcc compiles this to ldr: https://clang.godbolt.org/z/1bvH2x
// #[cfg_attr(test, assert_instr(ld1))]
pub unsafe fn vtstq_u8(a: uint8x16_t, b: uint8x16_t) -> uint8x16_t {
    vcgtq_u8(vandq_u8(a, b), vdupq_n_u8(0))
}

Ja I think that should be the same as vcgtq_u8(vandq_u8(a, b), vdupq_n_u8(0))

vandq_u8(a, b) should be a & b (vectorized) and vcgtq_u8(..., vdupq_n_u8(0)) should be != 0 (or >= ) vectorized (I dind't find a not-equal intrinsic but perhaps I just missed it?)

sadly not even clang compiles vtstq_u8 to a own intrinsic: https://godbolt.org/z/lwHshr

@sunnygleason
Copy link
Member

I'm crying 😂

image

@Licenser
Copy link
Member Author

I know it's ugly :( but it's correcter then mem::transmute, it's the load intrinsic that's made to well ... populate a register.

@Licenser
Copy link
Member Author

ideally we want an neon equivalent of _mm256_setr_epi8

@Licenser
Copy link
Member Author

hmm I got an idea! We could use a trait to add a ::new() method how about that?

@sunnygleason
Copy link
Member

hmm I got an idea! We could use a trait to add a ::new() method how about that?

Yes I love that idea! Only problem is can't add from outside the crate?

@Licenser
Copy link
Member Author

If we control the trait we don't need to control the type :D and it compiles to a vld1 instruciton one way or the other so ¯_(ツ)_/¯ doens't hurt to do the as *const dance in there - we take the hit complexity but it's nice for the user.

@Licenser
Copy link
Member Author

ahhh damn we can't have a trait with different arities for functions but we could pass in the slice, that's mildly better I think.

@sunnygleason
Copy link
Member

crazily enough I think mem::transmute made the most sense 😂
right now, I'm up for whatever -- this is close enough

@sunnygleason
Copy link
Member

interesting looks like they do compile down to the same: https://rust.godbolt.org/z/0ORCzh

image

@Licenser
Copy link
Member Author

:D yay! I think the ::new is a lot nicer.

@sunnygleason
Copy link
Member

sorry I left this on the wrong PR, oops!

@Licenser did you find the bug? I think I might have it:

8875cff#diff-e3cc34fc9e09211fc01983167aaf86d7R47

(signed versus unsigned subtraction in utf8check, we had an error in the previous intrinsics)

PS. ignore all the intrinsics changes, that's just me bisecting the problem

@Licenser
Copy link
Member Author

Wow, well done! I think that did it!!! Once this all works we should give it a sweep to see if we can simplify some of those intriniscs but most importantly it seems to actually work :D good bye std::arch!

@sunnygleason
Copy link
Member

@Licenser this looks good to me! Are there big changes you'd want to make before releasing the simd-lite crate,are we getting close?

@Licenser
Copy link
Member Author

Licenser commented Sep 3, 2019

I really really would like to see if we can get rid of the copied code in simd-lite before releasing it wherever possible. And were not we need to make sure it's properly attributed and includes the respective copyrights.

@sunnygleason
Copy link
Member

Merging, beautiful work @Licenser !

@sunnygleason sunnygleason merged commit 4acbe17 into arm Sep 3, 2019
@sunnygleason sunnygleason deleted the simd-lite branch September 3, 2019 20:12
sunnygleason pushed a commit that referenced this pull request Sep 4, 2019
* Put something in the readme so we can have a PR
* Add drone file
* update build status
* unguard for sse4.2 to allow rust to polyfill on older platforms
* Add more simd tests
* RFC: Neon support (pretty much working) (#35)
* feat: neon support
* feat: temp stub replacements for neon intrinsics (pending rust-lang/stdarch#792)
* fix: drone CI rustup nightly
* feat: fix guards, use rust stdlib for bit count operations
* fix: remove double semicolon
* feat: fancy generic generator functions, thanks @Licenser
* Update extq intrinsics
* Use simd-lite (#39)
* Use simd-lite
* Update badge
* Update badge
* Get rid of transmutes
* Use NeonInit trait
* vqsubq_u8 fix
* vqsubq_u8 fix pt. 2
* use reexprted values from simd-lite
* add simd-lite real version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants