Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for _mm_maddubs_epi16 and _mm_maddubs_epi16 and similar? #366

Open
samuelcolvin opened this issue Sep 24, 2023 · 2 comments
Open
Labels
C-feature-request Category: a feature request, i.e. not implemented / a PR

Comments

@samuelcolvin
Copy link

Hi, I'm looking for a way to implement these instructions with portable simd, but can't find any pointers.

Is this possible, if so how? Otherwise is there any willingness to add support?

For more context on what I'm trying to do, see here - basically int parsing by progressively collapsing SIMD arrays.

@samuelcolvin samuelcolvin added the C-feature-request Category: a feature request, i.e. not implemented / a PR label Sep 24, 2023
@calebzulawski
Copy link
Member

This is certainly a specialty function, only really supported by x86-64. As far as I can tell, from reading the documentation, you can mimic the function with something like:

pub fn maddubs(a: u8x16, b: i8x16) -> i16x8 {
    let a: i16x16 = a.cast();
    let b: i16x16 = b.cast();
    let m: i16x16 = a * b;
    simd_swizzle!(m, [0, 2, 4, 6, 8, 10, 12, 14])
        .saturating_add(simd_swizzle!(m, [1, 3, 5, 7, 9, 11, 13, 15]))
}

Unfortunately this does not produce great codegen, because LLVM doesn't seem to recognize it as pmaddubsw. It's possible some other formulation would result in better codegen. Unless there's a matching instruction on other architectures, I doubt this will ever be supported by std::simd as it's not particularly portable, but it could be possible to improve LLVM to recognize this pattern as a single instruction.

@samuelcolvin
Copy link
Author

Thanks, I do have an implementation of the same logic for aarch64, I'll try and look for it on my laptop, but to be honest it's more like "do the same calculation with different architecture" than exactly equivilant methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: a feature request, i.e. not implemented / a PR
Projects
None yet
Development

No branches or pull requests

2 participants