Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for RFC 2948: Portable SIMD #86656

Open
1 of 9 tasks
Tracked by #10
calebzulawski opened this issue Jun 27, 2021 · 35 comments
Open
1 of 9 tasks
Tracked by #10

Tracking Issue for RFC 2948: Portable SIMD #86656

calebzulawski opened this issue Jun 27, 2021 · 35 comments
Labels
A-SIMD Area: SIMD (Single Instruction Multiple Data) A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC needs-rfc This change is large or controversial enough that it should have an RFC accepted before doing it. PG-portable-simd Project group: Portable SIMD (https://github.com/rust-lang/project-portable-simd) T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@calebzulawski
Copy link
Member

calebzulawski commented Jun 27, 2021

Feature gate: #![feature(portable_simd)]

This is a tracking issue for the future feature chartered in RFC 2977, with the intent of creating something akin to the design in RFC 2948 (rust-lang/rfcs#2948): a portable SIMD library (std::simd).

Portable SIMD project group: https://github.com/rust-lang/project-portable-simd
Implementation: https://github.com/rust-lang/portable-simd

More discussion can be found in the #project-portable-simd zulip stream.

Steps

Unresolved Questions

  • What will the overall design be?
  • What are the ideal semantics for Masks?
  • Are there any limits or vector sizes we should not support?
  • How should these types interop with types like Saturating, NonZero, etc.?

Implementation History

@calebzulawski calebzulawski added C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jun 27, 2021
@jonas-schievink jonas-schievink added A-SIMD Area: SIMD (Single Instruction Multiple Data) PG-portable-simd Project group: Portable SIMD (https://github.com/rust-lang/project-portable-simd) labels Jun 27, 2021
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Nov 13, 2021
This enables programmers to use a safe alternative to the current
`extern "platform-intrinsics"` API for writing portable SIMD code.
This is `#![feature(portable_simd)]` as tracked in rust-lang#86656
bors added a commit to rust-lang-ci/rust that referenced this issue Nov 13, 2021
…crum

pub use core::simd;

A portable abstraction over SIMD has been a major pursuit in recent years for several programming languages. In Rust, `std::arch` offers explicit SIMD acceleration via compiler intrinsics, but it does so at the cost of having to individually maintain each and every single such API, and is almost completely `unsafe` to use.  `core::simd` offers safe abstractions that are resolved to the appropriate SIMD instructions by LLVM during compilation, including scalar instructions if that is all that is available.

`core::simd` is enabled by the `#![portable_simd]` nightly feature tracked in rust-lang#86656 and is introduced here by pulling in the https://github.com/rust-lang/portable-simd repository as a subtree. We built the repository out-of-tree to allow faster compilation and a stochastic test suite backed by the proptest crate to verify that different targets, features, and optimizations produce the same result, so that using this library does not introduce any surprises. As these tests are technically non-deterministic, and thus can introduce overly interesting Heisenbugs if included in the rustc CI, they are visible in the commit history of the subtree but do nothing here. Some tests **are** introduced via the documentation, but these use deterministic asserts.

There are multiple unsolved problems with the library at the current moment, including a want for better documentation, technical issues with LLVM scalarizing and lowering to libm, room for improvement for the APIs, and so far I have not added the necessary plumbing for allowing the more experimental or libm-dependent APIs to be used. However, I thought it would be prudent to open this for review in its current condition, as it is both usable and it is likely I am going to learn something else needs to be fixed when bors tries this out.

The major types are
- `core::simd::Simd<T, N>`
- `core::simd::Mask<T, N>`

There is also the `LaneCount` struct, which, together with the SimdElement and SupportedLaneCount traits, limit the implementation's maximum support to vectors we know will actually compile and provide supporting logic for bitmasks. I'm hoping to simplify at least some of these out of the way as the compiler and library evolve.
@workingjubilee workingjubilee added the needs-rfc This change is large or controversial enough that it should have an RFC accepted before doing it. label Dec 1, 2021
@HannesGitH
Copy link

Feature gate: #![feature(portable_simd)]

i'm sorry if this is the wrong place to ask but im rather new to rust and stumbled upon this issues as my compiler told me to

if i want to use this feature as soon as my compiler supports it can i gate it like:

#[cfg(feature = "portable_simd")]
use std::simd::Simd;

or is that only for feautures regarding my package (set in toml or passed to cargo?) if so what would be the appropriate way to use simd as soon as this issue is resolved?

@Lokathor
Copy link
Contributor

The #![feature(portable_simd)] part goes at the top of a binary or library.

It's a language feature not a cargo feature so it works a little differently.

It's unfortunate that they're both just "feature". Rust is often too terse when it counts.

@HannesGitH
Copy link

HannesGitH commented Feb 21, 2023

ok thanks a lot!

just to make sure this means there is no (easy*) way to use this language feature if my compiler supports it and fall back to a custom implementation otherwise?

*easy as in compile time guards / attribute-like macros or creating a custom wrapper module that either provides rusts simd or my own fallback or something else in that level of skill


for anyone else stumbling upon this:

language features are (unstable) features you can opt-in when using nightly rust (by putting the specified flag in your library root, the whole project will then be compiled with a compiler that uses this feature)

@CarlKCarlK
Copy link

CarlKCarlK commented Nov 22, 2023

@agausmann

  • To enable the experimental feature flag on nightly,
#![rustversion::attr(nightly, feature(portable_simd))]

@safinaskar

Unfortunately, this particular code doesn't work

This worked for me:

#![cfg_attr(feature = "from_slice", feature(portable_simd))]

where "from_slice" is the name of my the-other-kind-of-feature, defined in Cargo.toml, that uses portable_simd.

[features]
from_slice = []

So, I run tests, for example, via cargo test --features=from_slice.

@GlenDC
Copy link
Contributor

GlenDC commented Dec 7, 2023

Is this on the 2024 edition roadmap, or will it be only for after that? I know it’s not related, but gives me a timeline range.

@calebzulawski
Copy link
Member Author

I don't think anyone has a specific timeline, but we still need to draft a new RFC and go through the approval process, which can take some time.

@jhpratt
Copy link
Member

jhpratt commented Mar 8, 2024

Is there a particular reason that Simd does not implement Deref and DerefMut? I don't see any reason the impls would restrict the ability to do anything.

@Lokathor
Copy link
Contributor

Lokathor commented Mar 8, 2024

Like deref into a slice? Usually that's not done because it's a huge performance footgun.

@Firstyear
Copy link
Contributor

It may be good to document what that footgun is and why the choice was made because people will ask this again in future.

@Lokathor
Copy link
Contributor

Lokathor commented Mar 8, 2024

So, to add more detail: the problem is that (depending on SIMD used) you can't in general index to a particular lane of a SIMD register. So if you view the SIMD data as a slice and operate on an element of the slice, what the hardware must do is have the CPU stop the current SIMD processing, write the register to the stack, work on the stack value (however the slice is adjusted), and then load that back into a SIMD register. This is, in general, a performance disaster. As usual, the optimizer might be able to cut out this stall in the pipeline, in some cases, depending on circumstances, etc etc. But you should expect that the SIMD handling is totally stalled when trying to treat the data as a slice.

@jhpratt
Copy link
Member

jhpratt commented Mar 9, 2024

I figured there was a reason, but I'm not familiar with how SIMD works under the hood. Given that indexing is the problem, why implement Index and IndexMut then?

@Lokathor
Copy link
Contributor

Lokathor commented Mar 9, 2024

Oh, uh, well I haven't looked in a while! I guess I'm out of the loop on the current API details.

I'm surprised that Index is in if Deref is out. Either both should be in or both should be out, would be my expectation.

@calebzulawski
Copy link
Member Author

calebzulawski commented Mar 9, 2024

The basic idea is that we want a clear marker of the boundary between SIMD and non-SIMD operations. When using Index (vector[i]) there is an obvious sign that you are no longer using SIMD operations. Likewise with arrays and slices, we implement AsRef and the to_array function because these are explicit. The concern with Deref is that the automatic inclusion of all slice functions makes it harder to tell which operations are SIMD. For example, you may expect is_ascii to be vectorized, but instead it is simply a scalar implementation inherited from slices.

@Lokathor
Copy link
Contributor

Lokathor commented Mar 9, 2024

vector[i] isn't particularly more obvious, I would say.

Maybe we should just always make people convert to an array to index elements?

@calebzulawski
Copy link
Member Author

A while ago we didn't implement Index and we got requests for it, but this is the first time Deref has come up, so I think it's a good compromise. Maybe it's not particularly obvious that Index is the boundary, but Deref is completely invisible without consulting the docs.

@ZagButNoZig
Copy link

ZagButNoZig commented Apr 22, 2024

There are certain types of instructions where the output data type is different from the input data type like: _mm256_maddubs_epi16. I don't think there is a way to do that in portable simd without casting first which is slower? Are there any plans to support these instructions. Similar instructions also exist on arch: vdotq_s32

@abysssol
Copy link

Hi, I was wondering if there had been any discussion or consideration of making a dynamically sized api for vector operations. The current api seems to be analogous to arrays, but perhaps a more elegant and convenient solution would be analogous to slices.

I learned about this idea when researching risc-v's vector extension. Both this article and this one (fully rendered here) are good references on the motivation, from the perspective of an ISA.

While the current api is already much better than traditional simd instructions, it seems to me that the logical conclusion is a runtime sized type; maybe a wrapper around &mut [T], or a type like Vec<T>, or perhaps a modification to Vec<T> that guarantees simd optimization if T is a numeric primitive.

Hopefully this can spark a useful discussion on the best design of simd/vector types and operations. Thank you for your consideration.

@Lokathor
Copy link
Contributor

That could be some additional API that lives along aside the fixed sized SIMD types, but for the main CPU arches a fixed sized simd type is what generally works best with optimizations.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jul 12, 2024
as_simd: fix doc comment to be in line with align_to

In rust-lang#121201, the guarantees about `align_offset` and `align_to` were changed. This PR aims to correct the doc comment of `as_simd` to be in line with the new `align_to`.

Tagging rust-lang#86656 for good measure.
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Jul 12, 2024
Rollup merge of rust-lang#127422 - greaka:master, r=workingjubilee

as_simd: fix doc comment to be in line with align_to

In rust-lang#121201, the guarantees about `align_offset` and `align_to` were changed. This PR aims to correct the doc comment of `as_simd` to be in line with the new `align_to`.

Tagging rust-lang#86656 for good measure.
@colejohnson66
Copy link

Curious how ARM SVE and RISC-V V are meant to be used in Rust. The fixed-length abstraction is a nice one, and it's what .NET is going with in .NET 9 (Vector<T> for SVE is 128-bit, at least for now), but variable-length vectors are here to stay.

@dead-claudia
Copy link

dead-claudia commented Jul 20, 2024

Curious how ARM SVE and RISC-V V are meant to be used in Rust. The fixed-length abstraction is a nice one, and it's what .NET is going with in .NET 9 (Vector<T> for SVE is 128-bit, at least for now), but variable-length vectors are here to stay.

RISC-V offers extensions like Zvl128b that provide hard guarantees on minimum vector size. It should be possible to leverage this in the interim while RISC-V figures out their P extension (which isn't very far along).

Edit: fix extension name

@Salabar
Copy link

Salabar commented Aug 4, 2024

Would it it make sense to add a family of functions like "load_base_*" that take a slice and an isize index? It would account for buffer underflow as well as overflow. With this you can write things such as convolution with nice clean loops that don't have account for edge cases.

for i in 0..image.len(){
   let mut result = 0.;

  for j in 1..kernel.radius() / N {
    let left = Simd::<N>::load_base_or(image, i - j * N, splat(image[0]);
   //... 
 }
  for j in 0..kernel.radius() / N {
    let right= Simd::<N>::load_base_or(image, i + j * N, splat(image.last());
   //... 
}
  image[i] = result;
}

@DXist
Copy link

DXist commented Aug 11, 2024

Is it possible to move Mask inherent methods into a trait like SimdMask and add this trait as a bound to associated type Mask of other traits, e.g. SimdPartialEq?

This will help to write generic code that works for different primitive types.

Got this idea while writing a fixed index map data structure that is expected to work with unsigned integer keys regardless of the width.

Without the trait bound for Mask associated type I have to wrap my implementation into macros and explicitly apply it to u8, u16, u32, u64 and usize.

@calebzulawski
Copy link
Member Author

I think you should probably be able to do what you want:

fn generic<T>(v: Simd<T, 4>, m: Mask<T::Mask, 4>) -> bool
where
    T: SimdElement + Default,
    Simd<T, 4>: SimdPartialEq<Mask = Mask<T::Mask, 4>>,
{
    (v.simd_eq(Simd::splat(Default::default())) ^ m).all()
}

However, it would be nice if there were an easier way to do this without requiring that extra bound.

@DXist
Copy link

DXist commented Aug 12, 2024

@calebzulawski , thank you!

It worked along with a couple of bounds from num-traits crate.

Maybe an example with generic code will be a useful demo of bounds usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-SIMD Area: SIMD (Single Instruction Multiple Data) A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC needs-rfc This change is large or controversial enough that it should have an RFC accepted before doing it. PG-portable-simd Project group: Portable SIMD (https://github.com/rust-lang/project-portable-simd) T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests