Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement type casts (non-transmute & transmute) #116

Open
programmerjake opened this issue May 5, 2021 · 29 comments
Open

Implement type casts (non-transmute & transmute) #116

programmerjake opened this issue May 5, 2021 · 29 comments
Labels
C-feature-request Category: a feature request, i.e. not implemented / a PR

Comments

@programmerjake
Copy link
Member

programmerjake commented May 5, 2021

I wasn't able to find type casts, we need them:

fn f(v: f32x16) -> i8x16 {
    v.cast() // doesn't currently work
}

This is different than transmuting (which is something we also need; we're planning on mostly relying on safe-transmute for this).

@programmerjake programmerjake added the C-feature-request Category: a feature request, i.e. not implemented / a PR label May 5, 2021
@Lokathor
Copy link
Contributor

Lokathor commented May 5, 2021

We currently have to and from arrays, and the rest you can do from there in a pinch.

It didn't seem that there was much hardware support for casts within Intel and ARM so we pushed it to the back of the schedule.

@calebzulawski
Copy link
Member

calebzulawski commented May 5, 2021

The only casts that seem broadly supported is between floats and integers of the same width, which we do have implemented

@workingjubilee
Copy link
Member

workingjubilee commented May 7, 2021

Arrays are more ergonomic if used with the array map method.

@gilescope
Copy link

My case is going between i8x16 <--> i16x8 and the like. I have an impressive amount of transmutes... there's probably a nicer way?

@Lokathor
Copy link
Contributor

Lokathor commented May 8, 2021

So you want two i8 at a time to combine into i16?

or do you want to take in one integer register and produce two registers of data by sign extending each individual i8?

@gilescope
Copy link

The former. I'm parsing ints (quickly) and trying to see if I can use core_simd for the most part to do this:

https://github.com/pickfire/parseint/blob/b75bf5e47a3b2fdc1934bab046e677e033173e18/src/lib.rs#L151

The other thing I might be missing is a nice way to switch to and from __m128i for unsupported intrinsics.

@Lokathor
Copy link
Contributor

Lokathor commented May 8, 2021

I can't look too closely right now, but that transmute_copy is suspicious.

but yeah, you'll probably have some transmutes in there for a while until we fill in the edges of the API.

For a fix that works immediately you could convert this to the safe_arch crate, which has bytemuck support, then you're at least doing pretty much all safe code.

@gilescope
Copy link

Ah I see I can do i8x16::from(__m128i). - Excellent that removes a fair few transmutes.

@gilescope
Copy link

gilescope commented May 10, 2021 via email

@hsivonen
Copy link
Member

the trait `From<SimdU8<16_usize>>` is not implemented for `SimdU16<8_usize>`

Is there a principled reason for this considering that both u8x16 and u16x8 convert to/from __m128i using from()/into()? Or just not done yet? (I want to do a zero-cost endianness-dependent re-interpretation of the register.)

@calebzulawski
Copy link
Member

We haven't really prioritized conversions yet, but I think our general stance has been "this is a job for transmute, and safe transmute will make that easier". Converting between the std::arch types is a little different since __m128i doesn't encode integer type information, but maybe that should just be a transmute as well?

@Lokathor
Copy link
Contributor

Yeah if you deliberately want to change lane count like this while preserving register byte size, just go to the arch type and then from the arch type in your new lane count type using stdsimd. It's a hair verbose, but works totally fine.

@hsivonen
Copy link
Member

We haven't really prioritized conversions yet, but I think our general stance has been "this is a job for transmute, and safe transmute will make that easier".

I hope the timeline for that feature works out well for SIMD purposes.

Yeah if you deliberately want to change lane count like this while preserving register byte size, just go to the arch type and then from the arch type in your new lane count type using stdsimd. It's a hair verbose, but works totally fine.

Going via the arch type isn't portable and adds an unnecessary intermediate step.

@thomcc
Copy link
Member

thomcc commented May 23, 2021

I hope the timeline for that feature works out well for SIMD purposes.

Yeah, it doesn't look like it, honestly. That project seems like it has all but stalled.

Even if not, I also think it might be worth including explicit ways to perform conversions, given how common they are in SIMD code.

@thomcc
Copy link
Member

thomcc commented May 23, 2021

There's some prior art here in fN::{from,to}_bits, uN::{to/from}_ne_bytes, etc.

@Lokathor
Copy link
Contributor

I think that those are just "transmute that we're willing to bless because it's so common". So, safe transmute makes them in some sense pointless. However, I do sympathize that safe transmute will take a while to come around.

I think "just use transmute for now" kinda has to be the answer here though, unless we want to implement our own mini-safe-transmute API within the portable simd API. We could maybe do that, but I don't think we should stabilize that mini api in the long run, and I'm hesitant to suggest thay anyone write down a bunch of code (probably in the form of weird macro stuff) that we know we'll never keep long term.

@BurntSushi
Copy link
Member

BurntSushi commented May 23, 2021

unless we want to implement our own mini-safe-transmute API

Is there something more to this than defining the obvious From impls and implementing them with transmute?

I think for me, the thing that would push something like this over the edge is if this were a common source of unsafe in otherwise safe portable SIMD code. That would be pretty unfortunate and a compelling reason IMO for exposing safe bitcast APIs (just like std does for integers<->bytes). If you are otherwise writing unsafe everywhere, then a transmute or a pointer cast seems like less of a big deal.

I suspect the case is the former, although I'm not certain. If so, then yes, I think these APIs should absolutely be added unless there's some complexity here that I don't understand.

@Lokathor
Copy link
Contributor

Well, the approximate breakdown is:

  • It's probably the former, that otherwise pure safe code would have a transmute in it.
  • We could just put a bunch of From impls but then that would fly in the face of how From currently works with other number types (which might be acceptable overall?).
  • If we don't use From we probably don't want like one method per destination type on each source type. Because that's a lot of methods kinda clogging up the docs and the auto-complete lists and so on. Possible, but kinda a poor solution.
  • Which would mean some sort of alternative trait for just these special transmutes we want to support. And that's where we'd unfortunately have a mini safe transmute API sneaking into the SIMD stuff.
  • The eventual Safe Transmute API would absolutely without a doubt cover everything we put in as well as valid transmutes we maybe forget to put in.

So if the request, as given in the example code in the first post, is a general "cast" operation that "just works" then it feels like something that is unfortunately just slightly awkwardly out of scope. I admit that it's useful, but I don't think it best fits in this sub-project of rust in the long term.

@calebzulawski
Copy link
Member

Is there something more to this than defining the obvious From impls and implementing them with transmute?

The implementation is trivial, the reason we've been avoiding it is because it's not clear what semantics From should carry. Should i16 implement From<[u8; 2]>? Probably not. I think transmute carries the meaning we want, the downside is the unsafe.

I do think we can implement {to,from}_bits for all vectors, which would allow a safe "transmute" with the slight inconvenience of two function calls instead of one.

@BurntSushi
Copy link
Member

I do think we can implement {to,from}_bits for all vectors, which would allow a safe "transmute" with the slight inconvenience of two function calls instead of one.

That might be a nice compromise assuming the codegen is the same. (And I assume it would be.) @hsivonen What do you think?

@hsivonen
Copy link
Member

Indeed, lane reinterpretation is currently a source of unsafe in otherwise-safe code. to_bits/from_bits would work for me.

(Masks should zero-cost convert to integer lanes but not vice versa.)

@workingjubilee
Copy link
Member

I took a quick crack at this for five minutes and am just commenting here so that it's staring me down when I return to it: in order to make this sound, this requires either reworking the way we currently apply our "can actually be implemented" lane limit, or else accepting the extremely awkward return signature of

pub fn to_ne_bytes(self) -> [[u8; N]; LANES];
pub fn from_ne_bytes([[u8; N]; LANES]) -> Self;

Frankly, I don't think that signature is acceptable for all the uses everyone has been wanting this for, but I do have ideas for how to grind the lane limit under my heel and make this right.

@programmerjake
Copy link
Member Author

(Masks should zero-cost convert to integer lanes but not vice versa.)

That only applies to full-width masks, bitmasks will zero-cost convert to a different type (u64 or similar?).

@hsivonen
Copy link
Member

(Masks should zero-cost convert to integer lanes but not vice versa.)

That only applies to full-width masks, bitmasks will zero-cost convert to a different type (u64 or similar?).

I agree. (I meant SIMD register mask types in my previous remark.)

@calebzulawski
Copy link
Member

@workingjubilee I tried my hand at this and I think the way to go is:

trait ToBytes {
    type Bytes;
    #[doc(hidden)]
    fn to_bytes_impl(self) -> Self::Bytes { 
        unsafe { core::mem::transmute(self) }
    }
    #[doc(hidden)]
    fn from_bytes_impl(bytes: Self::Bytes) -> Self {
        unsafe { core::mem::transmute(bytes) }
    }
}

...
    pub fn to_ne_bytes(self) -> Self::Bytes { self.to_bytes_impl() }
    pub fn from_ne_bytes(bytes: Self::Bytes) -> Self { Self::from_bytes_impl(bytes) }
...

If we didn't have LanesAtMost32 and used const_evaluatable_checked this would work fine, but due to the lane limits, implementing this on SimdI64<8> isn't possible (since it needs SimdU8<64> bytes).

@calebzulawski
Copy link
Member

Based on the limitations of the implementation in the linked PR, using transmute directly is usually the best solution for now, but if you absolutely must have no unsafe, you can use byte conversions as long as the type is not more than 32 bytes long. Hopefully in the future that limitation can be lifted, but right now that's blocked on #90.

@programmerjake programmerjake changed the title Implement type casts Implement type casts (non-transmute & transmute) May 24, 2021
@programmerjake
Copy link
Member Author

changed the issue to also include transmutes (which weren't originally intended to be covered by this issue, but whatever...)

@aldanor
Copy link

aldanor commented Dec 3, 2021

By the way - wonder what's the suggested way to convert between lane types of different size but with the same number of lanes? (e.g. u8x16 to u16x16)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: a feature request, i.e. not implemented / a PR
Projects
None yet
Development

No branches or pull requests

9 participants