-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vector128.ShuffleUnsafe #88559
Vector128.ShuffleUnsafe #88559
Conversation
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics Issue DetailsClose #81609 issue.
|
Test failures look related |
if (AdvSimd.Arm64.IsSupported) | ||
{ | ||
(Vector128<byte>, Vector128<byte>) table = (vector.GetLower(), vector.GetUpper()); | ||
return Vector256.Create( | ||
AdvSimd.Arm64.VectorTableLookup(table, indices.GetLower()), | ||
AdvSimd.Arm64.VectorTableLookup(table, indices.GetUpper())); | ||
} | ||
|
||
if (PackedSimd.IsSupported) | ||
{ | ||
return Vector256.Create( | ||
PackedSimd.Shuffle(vector.GetLower(), vector.GetUpper(), indices.GetLower()), | ||
PackedSimd.Shuffle(vector.GetLower(), vector.GetUpper(), indices.GetUpper())); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need this additional handling for V256
/V512
. It isn't expected that users are calling these, let alone expecting good performance, when IsHardwareAccelerated
reports false
.
It's just causing the JIT to spend more time doing dead code elimination, managing the inliner budget, etc.
{ | ||
if (Avx2.IsSupported) | ||
{ | ||
return Avx2.Shuffle(vector, indices); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These hardware intrinsics have different semantics than Vector256/512 Shuffles.
Avx2.Shuffle is effectively two Vector128.Shuffles on each half. The indices are still [0, 15]. Similar for Avx512.
Vector256.Shuffle on the other hand works with the whole vector and allows you to pick and index [0, 31].
/// <remarks>Unlike Shuffle, this method delegates to the underlying hardware intrinsic without ensuring that <paramref name="indices"/> are normalized to [0, 31]. | ||
/// On hardware with <see cref="Avx2"/> support, indices are treated as modulo 32, and if the high bit is set, the result will be set to 0 for that element. | ||
/// On hardware with <see cref="AdvSimd.Arm64"/> support, this method behaves the same as Shuffle.</remarks> | ||
/// On hardware with <see cref="PackedSimd"/> support, indices are treated as modulo 32. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This remarks is confusing. Shuffle
doesn't ensure the indices are normalized. It simply treats all out of range indices as setting the result to 0
.
Unsafe
simply changes how out of range
indices are handled and allows it to differ based on the target hardware. In the case of xarch
most out of range indices are handled as mod 32. But indices with the MSB set are set to 0
instead.
We can probably exclude the Arm64/Wasm callouts for V256/V512 as they aren't important. They can be combined into a general statement that other platforms will treat this the same as Shuffle
@@ -2384,7 +2384,7 @@ public static Vector128<sbyte> Shuffle(Vector128<sbyte> vector, Vector128<sbyte> | |||
[CompExactlyDependsOn(typeof(AdvSimd))] | |||
[CompExactlyDependsOn(typeof(AdvSimd.Arm64))] | |||
[CompExactlyDependsOn(typeof(PackedSimd))] | |||
internal static Vector128<byte> ShuffleUnsafe(Vector128<byte> vector, Vector128<byte> indices) | |||
public static Vector128<byte> ShuffleUnsafe(Vector128<byte> vector, Vector128<byte> indices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anywhere else this should be consumed in dotnet/runtime as part of making it public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I updated all such places when adding the internal helper as part of #80963
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I know you updated places in corelib. I was wondering if there were places outside of corelib you couldn't update. Sounds like no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two places I saw where we still use the platform-specific instructions directly (BitArray and OptimizedInboxTextEncoder), but both of those are in larger platform-specific blocks of logic, so there was no reason to replace them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plenty of related tests are failing. Example:
Test System.Runtime.Intrinsics.Tests.Vectors.Vector256Tests.Vector256ByteShuffleOneInputWithLocalIndicesTest has failed
Please address the comments posted by other reviewers and fix the tests.
Thank you for your contribution!
This pull request has been automatically marked |
This pull request will now be closed since it had been marked |
Close #81609 issue.