Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement repr(packed) for repr(simd) #117116

Merged
merged 4 commits into from
Dec 11, 2023

Conversation

calebzulawski
Copy link
Member

This allows creating vectors with non-power-of-2 lengths that do not have padding. See rust-lang/portable-simd#319

@rustbot
Copy link
Collaborator

rustbot commented Oct 24, 2023

r? @oli-obk

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 24, 2023
@scottmcm
Copy link
Member

Is it possible it would make sense for this just to be what repr(simd) means, without needing packed?

Since all the other uses of it are power-of-two anyway, and thus if you make simd always be the GCD, that might be fine?

let align = dl.vector_align(size);

let (abi, align) = if def.repr().packed() && !e_len.is_power_of_two() {
// Non-power-of-two vectors have padding up to the next power-of-two.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc LLVM doesn't guarantee vectors don't have more padding than just to the next power of two, e.g. padding <2 x i8> to 128-bits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://llvm.org/docs/LangRef.html#vector-type seems to say that LLVM guarantees that vectors have no padding in their in memory representation? Padding <2 x i8> to 128 bits is something that LLVM does inside the vector registers (e.g. on aarch64), but never in memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's only for load/store, LLVM vector types still have big enough alignment that Rust requires padding in many cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's guaranteed, but rustc does make the assumption (somewhere, I forget where)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core::mem::size_of:

More specifically, this is the offset in bytes between successive elements in an array with that item type including alignment padding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I meant rustc assumes that npot vectors round up to the next power of two.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's only for load/store, LLVM vector types still have big enough alignment that Rust requires padding in many cases.

I thought LLVM types didn't have any sort of intrinsic alignment? And even if they have a preferred alignment that's too big, can't you just always annotate load/store instructions with the smaller alignment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's only for load/store, LLVM vector types still have big enough alignment that Rust requires padding in many cases.

I thought LLVM types didn't have any sort of intrinsic alignment? And even if they have a preferred alignment that's too big, can't you just always annotate load/store instructions with the smaller alignment?

they do have an intrinsic ABI alignment, it's used for default alignment of load/store/alloca when not explicitly specified and for stack spilling and a few other things

Copy link
Member

@programmerjake programmerjake Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's guaranteed, but rustc does make the assumption (somewhere, I forget where)

just found this (yeah, I know that's not what you meant, but I couldn't resist):
https://lang-team.rust-lang.org/frequently-requested-changes.html#size--stride

Rust assumes that the size of an object is equivalent to the stride of an object

@calebzulawski
Copy link
Member Author

Is it possible it would make sense for this just to be what repr(simd) means, without needing packed?

Since all the other uses of it are power-of-two anyway, and thus if you make simd always be the GCD, that might be fine?

Maybe eventually, but for now I didn't implement any way to call simd intrinsics on packed repr vectors--they still need to be loaded into full (padded) vectors. std::simd will need to convert the vectors before calling intrinsics.

@calebzulawski
Copy link
Member Author

Note that I also change the ABI, so it's not possible to pass these in vector registers (I think). So if target feature calling conventions ever get worked out, there may still be a benefit to the current implementation as well.

@workingjubilee
Copy link
Member

Note that I also change the ABI, so it's not possible to pass these in vector registers (I think). So if target feature calling conventions ever get worked out, there may still be a benefit to the current implementation as well.

Given that we're specifically working on defining target feature minima specifically so we can do that, I don't think that blocking passing in registers is acceptable.

@calebzulawski
Copy link
Member Author

Unfortunately I think we simply can't have it both ways. LLVM defines vectors as having padding and we would like them to not have padding. This change doesn't affect all vectors, just those with repr(packed), which specifically removes padding and reduces alignment. A user could always use a non repr(packed) vector to pass by register.

@programmerjake
Copy link
Member

Note that I also change the ABI, so it's not possible to pass these in vector registers (I think). So if target feature calling conventions ever get worked out, there may still be a benefit to the current implementation as well.

Given that we're specifically working on defining target feature minima specifically so we can do that, I don't think that blocking passing in registers is acceptable.

I don't expect this to block passing in registers, because this is defining the in-memory ABI. the passing-by-value ABI can be totally different, since semantically you have to then store that value somewhere in memory before you can look at the bytes.

@programmerjake
Copy link
Member

Unfortunately I think we simply can't have it both ways.

I think we can have it both ways -- just pass-by-value as a LLVM vector and use the array type for in-memory address calculations and allocating memory. load/store with vector types only read-write the non-padding bytes, so we're fine.

@calebzulawski
Copy link
Member Author

Unfortunately I think we simply can't have it both ways.

I think we can have it both ways -- just pass-by-value as a LLVM vector and use the array type for in-memory address calculations and allocating memory. load/store with vector types only read-write the non-padding bytes, so we're fine.

True, this is definitely possible once we are able to.

@workingjubilee
Copy link
Member

I don't think that will optimize well at all. LLVM historically reasons about in-memory and in-register data in conflationary ways.

@calebzulawski
Copy link
Member Author

calebzulawski commented Oct 24, 2023

I don't think there's any optimization involved. It's just a single vector load.

Either way, this PR doesn't change how existing vectors work. repr(packed) is specified to remove padding and it doesn't work correctly on SIMD types, which this change fixes.

I agree that the calling convention could become an issue in the future, but I'd like to deal with that in the future when it comes up, since there's still no consensus on a fix regardless of this PR. Also, fixing repr(packed) for SIMD doesn't require us to use it for std::simd, but I'd like to fix this issue first so we can try it out.

@workingjubilee
Copy link
Member

"Legalizing to a vector load instead of a series of scalar loads" is an optimization.

@workingjubilee
Copy link
Member

...Okay, I didn't quite divine from the messages/code changes that repr(packed) doesn't have any functionality currently with repr(simd), and that we don't even emit an error.

@workingjubilee
Copy link
Member

workingjubilee commented Oct 25, 2023

I would prefer this to come with codegen/assembly tests to demonstrate what it actually looks like when these types are interacted with, and to prove it like... legalizes correctly, where "correctly" in this case is probably "not an LLVM error, I guess?"

@calebzulawski
Copy link
Member Author

Since repr(simd, packed) can't really be used directly, I added a pretty simple test that just loads a packed vector and performs an operation on it, just to make sure it doesn't crash LLVM or anything like that.

@workingjubilee
Copy link
Member

workingjubilee commented Nov 14, 2023

I want to approve this but I still can't see what the emitted LLVMIR is, and I can't e.g. increase my confidence by reaching for friends who know way more about LLVMIR and legalization if there's no codegen diffs to show them.

@calebzulawski
Copy link
Member Author

calebzulawski commented Nov 15, 2023

Not quite as rigorous as as a codegen test, consider this reduced case for illustrative purposes:

#![feature(repr_simd, platform_intrinsics)]

#[repr(simd, packed)]
pub struct Simd<T, const N: usize>([T; N]);

#[repr(simd)]
#[derive(Copy, Clone)]
pub struct FullSimd<T, const N: usize>([T; N]);

extern "platform-intrinsic" {
    fn simd_mul<T>(a: T, b: T) -> T;
}

// non-powers-of-two have padding and need to be expanded to full vectors
fn load<T, const N: usize>(v: Simd<T, N>) -> FullSimd<T, N> {
    unsafe {
        let mut tmp = core::mem::MaybeUninit::<FullSimd<T, N>>::uninit();
        std::ptr::copy_nonoverlapping(&v as *const _, tmp.as_mut_ptr().cast(), 1);
        tmp.assume_init()
    }
}

pub fn square(x: Simd<f32, 3>) -> FullSimd<f32, 3> {
    let x = load(x);
    unsafe { simd_mul(x, x) }
}

With optimization, this simply generates the following:

; foo::square
; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite) uwtable
define void @_ZN3foo6square17h0c2320cc22379be1E(ptr noalias nocapture noundef writeonly sret(<3 x float>) align 16 dereferenceable(16) %_0, ptr noalias nocapture noundef readonly align 4 dereferenceable(12) %x) unnamed_addr #0 {
start:
  %x.val = load <3 x float>, ptr %x, align 4
  %0 = fmul <3 x float> %x.val, %x.val
  store <3 x float> %0, ptr %_0, align 16
  ret void
}

This is nearly the same as using regular vectors, but note that the input %x is passed as a pointer with align 4. Remember that vectors are always passed by reference anyway, but it would be align 16. You will also see on the first line, where a regular vector load is align 16, this load is align 4.

@calebzulawski
Copy link
Member Author

Whoops, accidentally closed while writing my comment :)

@calebzulawski
Copy link
Member Author

@workingjubilee I added my comment above as a codegen test

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Dec 9, 2023

💔 Test failed - checks-actions

@workingjubilee
Copy link
Member

@bors r-

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 9, 2023
@calebzulawski
Copy link
Member Author

@workingjubilee I removed the codegen test since it's so dependent on optimization (just little things like attributes are changing, but making it impossible to write), and the tests run on so many optimization levels, I can't seem to make a useful test. Considering repr(simd) is unstable in the first place, hopefully you've seen enough codegen examples to be confident enough to merge as-is

@workingjubilee
Copy link
Member

Indeed, I mostly wanted an example! sadness about the test, though. It really shouldn't be so hard...

@bors r+

@bors
Copy link
Contributor

bors commented Dec 11, 2023

📌 Commit aa00bae has been approved by workingjubilee

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 11, 2023
@bors
Copy link
Contributor

bors commented Dec 11, 2023

⌛ Testing commit aa00bae with merge 8b1ba11...

@bors
Copy link
Contributor

bors commented Dec 11, 2023

☀️ Test successful - checks-actions
Approved by: workingjubilee
Pushing 8b1ba11 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Dec 11, 2023
@bors bors merged commit 8b1ba11 into rust-lang:master Dec 11, 2023
12 checks passed
@rustbot rustbot added this to the 1.76.0 milestone Dec 11, 2023
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (8b1ba11): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 669.249s -> 669.313s (0.01%)
Artifact size: 314.19 MiB -> 314.17 MiB (-0.01%)

@calebzulawski
Copy link
Member Author

Thanks :)

@workingjubilee workingjubilee added A-SIMD Area: SIMD (Single Instruction Multiple Data) PG-portable-simd Project group: Portable SIMD (https://github.com/rust-lang/project-portable-simd) labels Dec 14, 2023
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 2, 2024
…nsics, r=workingjubilee

Make repr(packed) vectors work with SIMD intrinsics

In rust-lang#117116 I fixed `#[repr(packed, simd)]` by doing the expected thing and removing padding from the layout.  This should be the last step in providing a solution to rust-lang/portable-simd#319
workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Jun 2, 2024
…rinsics, r=workingjubilee

Make repr(packed) vectors work with SIMD intrinsics

In rust-lang#117116 I fixed `#[repr(packed, simd)]` by doing the expected thing and removing padding from the layout.  This should be the last step in providing a solution to rust-lang/portable-simd#319
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Jun 2, 2024
Rollup merge of rust-lang#125311 - calebzulawski:repr-packed-simd-intrinsics, r=workingjubilee

Make repr(packed) vectors work with SIMD intrinsics

In rust-lang#117116 I fixed `#[repr(packed, simd)]` by doing the expected thing and removing padding from the layout.  This should be the last step in providing a solution to rust-lang/portable-simd#319
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 3, 2024
…, r=calebzulawski

Test codegen for `repr(packed,simd)` -> `repr(simd)`

This adds the codegen test originally requested in rust-lang#117116 but exploiting the collection of features in FileCheck and compiletest to make it more resilient to expectations being broken by optimization levels. Mostly by presetting optimization levels for each revision of the tests.

I do not think the dereferenceable attribute's presence or absence is that important.

r? `@calebzulawski`
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 3, 2024
…, r=calebzulawski

Test codegen for `repr(packed,simd)` -> `repr(simd)`

This adds the codegen test originally requested in rust-lang#117116 but exploiting the collection of features in FileCheck and compiletest to make it more resilient to expectations being broken by optimization levels. Mostly by presetting optimization levels for each revision of the tests.

I do not think the dereferenceable attribute's presence or absence is that important.

r? `@calebzulawski`
#![allow(non_camel_case_types)]

#[repr(simd, packed)]
struct Simd<T, const N: usize>([T; N]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From how packed usually works, I would expect this to mean that the type has alignment 1. But that doesn't seem to be the case, instead the alignment is the largest possible for the size, or something like that?

What happens with packed(N)?

Would be good to have the interaction of simd and packed documented somewhere.

@RalfJung
Copy link
Member

RalfJung commented Jun 8, 2024

FWIW codegen has support for using different LLVM types in by-val vs by-ref situations: specifically, bool is i8 in memory but i1 as an SSA local. Maybe something similar could be done for these packed simd types?

if ty.is_simd() && !matches!(arg.val, OperandValue::Immediate(_)) {
return_error!(InvalidMonomorphization::SimdArgument { span, name, ty: *ty });
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems no longer accurate... maybe that got changed by #125311? Doing simd_mul etc on a packed SIMD type works just fine now. Even in debug builds this generates the code one would hope for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-SIMD Area: SIMD (Single Instruction Multiple Data) merged-by-bors This PR was explicitly merged by bors. PG-portable-simd Project group: Portable SIMD (https://github.com/rust-lang/project-portable-simd) S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.