-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating Vector<T> to support opt-in 512-bit widths #97460
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue Detailsnull
|
Do we have any places in our own use of runtime/src/libraries/System.Linq/src/System/Linq/Range.SpeedOpt.cs Lines 45 to 48 in 95bf3d9
but we explicitly guard it on Count because we wrote it knowing this might be coming (though we could also update that now to construct the Vector with a larger sequence so it lights up on 512.) I'm wondering if we might have any others that would need to be tweaked. |
I know we have a few places throughout the BCL which guard against it being larger than 256-bits and will use an alternative path if it is. Those will definitely need to be found and updated where relevant, but shouldn't be blocked on this PR (especially since its opt-in). Some of the patterns like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks. The arm changes will come handy for SVE work, right?
@@ -202,99 +202,99 @@ SIMD_AS_HWINTRINSIC_ID(Vector4, WithElement, | |||
// {TYP_BYTE, TYP_UBYTE, TYP_SHORT, TYP_USHORT, TYP_INT, TYP_UINT, TYP_LONG, TYP_ULONG, TYP_FLOAT, TYP_DOUBLE} | |||
// ************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************* | |||
// Vector<T> Intrinsics | |||
SIMD_AS_HWINTRINSIC_ID(VectorT128, Abs, 1, {NI_VectorT128_Abs, NI_VectorT128_Abs, NI_VectorT128_Abs, NI_VectorT128_Abs, NI_VectorT128_Abs, NI_VectorT128_Abs, NI_VectorT128_Abs, NI_VectorT128_Abs, NI_VectorT128_Abs, NI_VectorT128_Abs}, SimdAsHWIntrinsicFlag::None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the changes in this and simdashwintrinsiclistxarch.h
are mainly renaming VectorT128
to VectorT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that's the only change here which should simplify adding future sizes to Vector<T>
as we no longer have to duplicate table entries per size.
@@ -682,30 +641,11 @@ GenTree* Compiler::impSimdAsHWIntrinsicSpecial(NamedIntrinsic intrinsic, | |||
break; | |||
} | |||
|
|||
case NI_VectorT128_Sum: | |||
{ | |||
if (varTypeIsFloating(simdBaseType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the case for VectorT_Sum
is moved below. wondering we don't need InstructionSet_*
checks anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation was updated to not use the horizontal operations anymore, as it has better performance on modern hardware and was needed for Vector512_Sum anyways.
Diff results for #97460Throughput diffsThroughput diffs for linux/x64 ran on linux/x64Overall (-0.05% to -0.00%)
FullOpts (-0.05% to -0.00%)
Details here Throughput diffs for linux/x64 ran on windows/x64Overall (-0.05% to +0.02%)
MinOpts (+0.01% to +0.03%)
FullOpts (-0.05% to +0.02%)
Details here |
Diff results for #97460Assembly diffsAssembly diffs for linux/x64 ran on windows/x64Diffs are based on 1,915,317 contexts (623,081 MinOpts, 1,292,236 FullOpts). MISSED contexts: base: 0 (0.00%), diff: 174 (0.01%) Overall (-166 bytes)
FullOpts (-166 bytes)
Details here Throughput diffsThroughput diffs for linux/x64 ran on linux/x64Overall (-0.05% to -0.00%)
FullOpts (-0.05% to +0.00%)
Details here |
Diff results for #97460Assembly diffsAssembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,512,082 contexts (977,766 MinOpts, 1,534,316 FullOpts). MISSED contexts: base: 0 (0.00%), diff: 180 (0.01%) Overall (-166 bytes)
FullOpts (-166 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,373,018 contexts (928,740 MinOpts, 1,444,278 FullOpts). MISSED contexts: base: 0 (0.00%), diff: 183 (0.01%) Overall (+407 bytes)
FullOpts (+407 bytes)
Details here Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,298,941 contexts (840,452 MinOpts, 1,458,489 FullOpts). MISSED contexts: base: 7 (0.00%), diff: 187 (0.01%) Overall (-50 bytes)
FullOpts (-50 bytes)
Details here Throughput diffsThroughput diffs for linux/x64 ran on windows/x64Overall (-0.05% to +0.02%)
MinOpts (+0.01% to +0.03%)
FullOpts (-0.05% to +0.02%)
Throughput diffs for windows/x64 ran on windows/x64Overall (-0.05% to +0.02%)
MinOpts (+0.01% to +0.03%)
FullOpts (-0.05% to +0.02%)
Details here Throughput diffs for windows/x86 ran on windows/x86Overall (-0.05% to +0.00%)
FullOpts (-0.05% to +0.00%)
Details here |
No description provided.