-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix layout of non-power-of-two length vectors #422
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! once the test failures are fixed, feel free to merge
Is this failure a codegen problem? A handful of architectures work (unfortunately I can't replicate right now, I'm on mac where it passes) |
looks like aarch64 cross failed due to OOM... |
I guess testing every length is excessive, I'll reduce it |
it could be an aarch64 backend bug in llvm, since so far only aarch64 linux/mac has OOM-ed. |
iirc I tried to pick lengths that are around powers of 2 and around 3 * powers of 2...so like 15,16,17,23,24,25,31,32,33... |
well, looks like powerpc-unknown-linux-gnu has a non-trivial failure (not OOM): |
if you can minimize the bugs you encountered with aarch64 and powerpc, I think submitting a bug report to LLVM would be good! |
dbe18a4
to
e3dabf5
Compare
The PowerPC errors are genuine. |
I don't know if they're actually incorrect, however, as they are likely to be an endianness problem. |
There are basically 3 different classes of errors here:
All of these should be "fixed" now, since we've removed the bitmask vectors. I am curious what was causing the second error but not sure I'll get the chance to look into it yet |
(I attempted to quote reply, but accidentally edited your comment instead, sorry. replied this time)
we're still getting SIGKILL on aarch64 -- it could be too many tests, it could also be an excessive memory usage bug for non-excessive input code with weird vector lengths in the aarch64 llvm backend. |
I'm seeing that too. I can replicate it if I build for aarch64. It seems to be an infinite loop. Even building with --emit=llvm-ir I can't get it to complete (the tests that fail are cast, u8_ops, and i8_ops).
|
(apparently I can't click in the right spot today, since I edited your comment again)
can you get llvm-ir with |
Looks like only |
I think you meant
try running if the llvm-ir is small enough, it would be great if you would put it in llvm.godbolt.org and share it here, which would let us try and figure it out without having to compile everything locally, plus Compiler Explorer has nice features for showing what changed in which passes (technically you can do that with the command line, but the website is much more friendly). |
So it looks like the problem is only with |
ok, I reduced the problem: https://llvm.godbolt.org/z/1Ehsh97nP |
so, maybe add a workaround for fp to int only on aarch64 that expands element count to the next power of two? idk if that will fix it, the backend bug may also occur for other ops. |
That did fix it, there was also another codegen bug-- |
d513647
to
e8a56e4
Compare
@@ -99,7 +99,7 @@ use crate::simd::{ | |||
// directly constructing an instance of the type (i.e. `let vector = Simd(array)`) should be | |||
// avoided, as it will likely become illegal on `#[repr(simd)]` structs in the future. It also | |||
// causes rustc to emit illegal LLVM IR in some cases. | |||
#[repr(simd)] | |||
#[repr(simd, packed)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the plan for simd
without packed
? Miri currently ICEs when such a type is used with a simd intrinsic and the size is not a power of 2. If portable-simd doesn't need support for that then do we need to have it at all? Can we just make simd
itself have the behavior that simd, packed
now has?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simd
without packed
is used by stdarch. portable-simd might use it in the future too, though imo probably won't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK stdarch only uses power-of-2 vectors, where packed
makes no difference?
@@ -639,43 +627,30 @@ macro_rules! test_lanes_panic { | |||
core_simd::simd::LaneCount<$lanes>: core_simd::simd::SupportedLaneCount, | |||
$body | |||
|
|||
// test some odd and even non-power-of-2 lengths on miri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't have any even non-power-of-2. Maybe replace 5 by 6?
(Though I am also not sure why odd vs even would be an interesting difference here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you look down further it tests length 3 on miri, the idea is we want to catch bugs caused by repr(simd, packed)
having alignment smaller than repr(simd)
which only happens for non-power-of-2 sizes. even non-power-of-2 sizes cover where the alignment is in between the element alignment and the non-packed alignment. @calebzulawski can you add back in length 6 since that's the smallest length where that occurs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah 3 and 6 would probably be reasonable then. To cut down on CI times I'd remove 5.
Is there an issue for that?
I think the code did account for endianess in the wrong way, see rust-lang/rust#126171. |
6e03d63
to
ce73c96
Compare
98f923e
to
a49f77e
Compare
unsafe { core::intrinsics::simd::$simd_call($lhs, rhs) } | ||
|
||
// aarch64 div fails for arbitrary `v % 0`, mod fails when rhs is MIN, for non-powers-of-two | ||
// these operations aren't vectorized on aarch64 anyway |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are LLVM backend bugs, right? simd_div/simd_rem still should work the same on all targets?
That seems worth tracking somewhere, having subtly buggy intrinsics is no good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, theoretically LLVM should be able to generate SIMD code for division/remainder by a constant, by using the exact same fancy math as it would use for scalars (which it unfortunately currently does after scalarization of div ops for non-power-of-2 vectors), so once LLVM's bugs are fixed, I think we should switch back to generating SIMD ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these are definitely backend bugs
Co-authored-by: Ralf Jung <[email protected]>
400e6e8
to
2a3b8ad
Compare
Any ideas why the proptest variable doesn't seem to make it into cross? Or maybe it is, but it's not the number of cases that make the tests slow? |
using the github actions feature that shows a timestamp for each line of output (get to it by clicking the settings gear on that actions job page), it looks like running the debug tests is taking almost all of the time... |
That gave me an idea, turns out you can set the optimization level of all dependencies outside of the workspace. I suspected maybe proptest itself was the slow part. If we're okay with this change, it dramatically improves test times |
Fixes #63, fixes #319