Replace use of target dependent `TestZ` intrinsic #104488

xtqqczze · 2024-07-05T19:27:04Z

Contributes to #101251.

dotnet-policy-service · 2024-07-06T14:34:21Z

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

tannergooding · 2024-07-10T15:27:23Z

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

@@ -1521,22 +1521,15 @@ private static bool VectorContainsNonAsciiChar(Vector128<byte> asciiVector)
        internal static bool VectorContainsNonAsciiChar(Vector128<ushort> utf16Vector)
        {
            // prefer architecture specific intrinsic as they offer better perf
-            if (Sse2.IsSupported)
+#pragma warning disable IntrinsicsInSystemPrivateCoreLibConditionParsing // A negated IsSupported condition isn't parseable by the intrinsics analyzer
+            if (Sse2.IsSupported && !Sse41.IsSupported)


The warning exists because this is going to flag the code as depending on SSE4.1

We should likely not consider the SSE2 path and just have SSE4.1, as such hardware is increasingly irrelevant (more than 17 years old at this point) and even crossgen is defaulting to an opportunistic SSE4.1 target today

That is, remove this path entirely and just have if (AdvSimd.Arm64.IsSupported) { } else { }

This will do the right thing for SSE4.1 already and while it might slightly pessimize an SSE2 only machine, such machines are old/rare enough that's acceptable IMO. We're talking about what ends up being an instruction or so difference in practice

These changes can be made in a follow-up PR.

@xtqqczze, I thought I had responded to this, but apparently missed it.

This isn't "just" about the analyzer, but also pertains to how the trimmer and other tooling works. We're going to need to fix this before the PR can be merged or it will cause regressions and the PR will be reverted by default.

@tannergooding Is d21415f still problematic?

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

xtqqczze · 2024-07-10T16:07:10Z

Depends on #102705.

xtqqczze · 2024-07-10T23:27:13Z

@MihuBot

xtqqczze · 2024-07-11T00:18:49Z

@tannergooding There are unexpected regressions, see MihuBot/runtime-utils#501

`System.Text.Ascii:IsValidCore[ubyte](byref,int):ubyte`

 G_M58774_IG07:
        cmp      esi, 64
        jg       SHORT G_M58774_IG08
        vmovups  ymm0, ymmword ptr [rdi]
-       vpor     ymm0, ymm0, ymmword ptr [rax-0x20]
-       vmovups  ymm1, ymmword ptr [reloc @RWD32]
-       vptest   ymm0, ymm1
+       vmovups  ymm1, ymmword ptr [rax-0x20]
+       vmovups  ymm2, ymmword ptr [reloc @RWD32]
+       vpternlogd ymm0, ymm1, ymm2, -88
+       vptest   ymm0, ymm0
        sete     cl
        movzx    rcx, cl
        jmp      SHORT G_M58774_IG04
-						;; size=35 bbWeight=0.50 PerfScore 10.75
+						;; size=42 bbWeight=0.50 PerfScore 12.00

tannergooding · 2024-07-11T01:01:18Z

That particular case should be mostly resolved with #104517, vpternlog was missing general support for ensuring we select optimal containment

There's more improvements that could be had as well, but its a step in the right direction overall

xtqqczze · 2024-07-11T10:31:07Z

~~Depends on #104517.~~

xtqqczze · 2024-07-12T12:09:28Z

I'm trying a refactoring using ISimdVector in xtqqczze@a410294.

xtqqczze · 2024-07-13T20:00:21Z

@MihuBot

xtqqczze · 2024-07-13T20:40:11Z

That particular case should be mostly resolved with #104517, vpternlog was missing general support for ensuring we select optimal containment

There's more improvements that could be had as well, but its a step in the right direction overall

~~@tannergooding Yes that case is mostly resolved, see MihuBot/runtime-utils#519~~

System.Text.Ascii:IsValidCore[ubyte](byref,int):ubyte

 G_M58774_IG07:
        cmp      esi, 64
        jg       SHORT G_M58774_IG08
        vmovups  ymm0, ymmword ptr [rdi]
-       vpor     ymm0, ymm0, ymmword ptr [rax-0x20]
        vmovups  ymm1, ymmword ptr [reloc @RWD32]
-       vptest   ymm0, ymm1
+       vpternlogd ymm0, ymm1, ymmword ptr [rax-0x20], -56
+       vptest   ymm0, ymm0
        sete     cl
        movzx    rcx, cl
        jmp      SHORT G_M58774_IG04
-						;; size=35 bbWeight=0.50 PerfScore 10.75
+						;; size=38 bbWeight=0.50 PerfScore 10.75

Analysis with llvm-mca shows 22 vs 15 cycles of latency, an increase of 7, see https://analysis.godbolt.org/z/Yj3Ydoees

xtqqczze · 2024-07-13T20:44:25Z

@tannergooding After #104517, still seeing some (not new) regressions, see MihuBot/runtime-utils#519

System.Text.Ascii:GetIndexOfFirstNonAsciiChar_Intrinsified(ulong,ulong):ulong

 G_M29265_IG06:
        vmovups  xmm0, xmmword ptr [rdi]
        vmovups  xmm2, xmmword ptr [rdi+0x10]
-       vpor     xmm3, xmm0, xmm2
-       vptest   xmm3, xmm1
+       vmovaps  xmm3, xmm0
+       vpternlogd xmm3, xmm1, xmm2, -56
+       vptest   xmm3, xmm3
        jne      G_M29265_IG16
        add      rdi, 32
        cmp      rdi, rcx
        jbe      SHORT G_M29265_IG06
-						;; size=33 bbWeight=4 PerfScore 55.33
+						;; size=40 bbWeight=4 PerfScore 57.00

Analysis with llvm-mca shows 14 vs 14 cycles of latency, no increase, see https://analysis.godbolt.org/z/se74h8nvq

tannergooding · 2024-07-13T22:09:20Z

It's not really a regression per say and in other cases could be an optimization.

Either way, it shouldn't be impactful to this PR, I'll just finish the remaining work here and not fold in the case the AND is part of an op_Equality check

xtqqczze · 2024-07-16T20:52:33Z

~~@tannergooding Depends on #104944.~~

tannergooding · 2024-07-17T19:49:32Z

I wouldn't say this depends on #10944, I think it's fine to take as is. If you could mark it ready for review then I can give it a final pass and merge

xtqqczze · 2024-07-17T20:44:48Z

@tannergooding I've suggested we make the suggested changes in a follow-up PR.

…onParsing

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jul 5, 2024

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jul 5, 2024

xtqqczze force-pushed the TestZ branch from bec9604 to ef314be Compare July 5, 2024 19:32

stephentoub requested a review from tannergooding July 5, 2024 20:59

xtqqczze force-pushed the TestZ branch 4 times, most recently from d58b185 to 05211ad Compare July 6, 2024 00:15

Replace uses of TestZ intrinsic

ebe50db

xtqqczze force-pushed the TestZ branch from 05211ad to ebe50db Compare July 6, 2024 09:38

xtqqczze changed the title ~~Replace uses of TestZ intrinsic~~ Replace use of target dependent TestZ intrinsic Jul 6, 2024

build-analysis bot mentioned this pull request Jul 6, 2024

TimeProviderTests.TestProviderTimer failed in CI #103459

Closed

jkotas added area-System.Runtime.Intrinsics and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jul 6, 2024

xtqqczze marked this pull request as ready for review July 6, 2024 14:35

tannergooding reviewed Jul 10, 2024

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs Show resolved Hide resolved

tannergooding reviewed Jul 10, 2024

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs Show resolved Hide resolved

xtqqczze marked this pull request as draft July 10, 2024 16:07

This was referenced Jul 10, 2024

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

The job running on agent NetCore-Public ran longer than the maximum time #104044

Closed

MihuBot mentioned this pull request Jul 10, 2024

[JitDiff X64] [xtqqczze] Replace use of target dependent TestZ intrinsic MihuBot/runtime-utils#501

Open

MihuBot mentioned this pull request Jul 13, 2024

[JitDiff X64] [xtqqczze] Replace use of target dependent TestZ intrinsic MihuBot/runtime-utils#519

Open

MihuBot mentioned this pull request Jul 17, 2024

[JitDiff X64] [xtqqczze] Replace use of target dependent TestZ intrinsic MihuBot/runtime-utils#537

Open

xtqqczze mentioned this pull request Jul 17, 2024

Improve the handling of SIMD comparisons #104944

Merged

xtqqczze marked this pull request as ready for review July 17, 2024 20:41

xtqqczze mentioned this pull request Jul 17, 2024

Replace AdvSimd intrinsics with (vec & cns) == zero #105047

Open

xtqqczze closed this Oct 7, 2024

xtqqczze reopened this Oct 7, 2024

xtqqczze requested a review from tannergooding October 7, 2024 18:04

build-analysis bot mentioned this pull request Oct 7, 2024

restarted. Azure DevOps can't recover from restarts. dotnet/dnceng#3879

Open

3 tasks

Merge branch 'main' into TestZ

a6c7141

This was referenced Oct 16, 2024

WasmTestOnChrome timeouts in CI #105363

Closed

DebuggerTests other than WasmTestOnChrome timeouts in CI #108921

Closed

Merge branch 'main' into TestZ

4dbcff0

build-analysis bot mentioned this pull request Oct 30, 2024

iOS tests failing with WORKLOAD TIMED OUT - Killing user command. #108103

Open

Merge branch 'main' into TestZ

0b7a129

remove pragma warning disable IntrinsicsInSystemPrivateCoreLibConditi…

d21415f

…onParsing

xtqqczze force-pushed the TestZ branch from 8d25252 to d21415f Compare November 23, 2024 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace use of target dependent `TestZ` intrinsic #104488

Replace use of target dependent `TestZ` intrinsic #104488

xtqqczze commented Jul 5, 2024 •

edited

Loading

dotnet-policy-service bot commented Jul 6, 2024

tannergooding Jul 10, 2024

tannergooding Jul 10, 2024

xtqqczze Jul 17, 2024

tannergooding Nov 21, 2024

xtqqczze Nov 23, 2024

xtqqczze commented Jul 10, 2024

xtqqczze commented Jul 10, 2024

xtqqczze commented Jul 11, 2024 •

edited

Loading

tannergooding commented Jul 11, 2024

xtqqczze commented Jul 11, 2024 •

edited

Loading

xtqqczze commented Jul 12, 2024 •

edited

Loading

xtqqczze commented Jul 13, 2024

xtqqczze commented Jul 13, 2024 •

edited

Loading

xtqqczze commented Jul 13, 2024 •

edited

Loading

tannergooding commented Jul 13, 2024

xtqqczze commented Jul 16, 2024 •

edited

Loading

tannergooding commented Jul 17, 2024

xtqqczze commented Jul 17, 2024

Replace use of target dependent TestZ intrinsic #104488

Are you sure you want to change the base?

Replace use of target dependent TestZ intrinsic #104488

Conversation

xtqqczze commented Jul 5, 2024 • edited Loading

dotnet-policy-service bot commented Jul 6, 2024

tannergooding Jul 10, 2024

Choose a reason for hiding this comment

tannergooding Jul 10, 2024

Choose a reason for hiding this comment

xtqqczze Jul 17, 2024

Choose a reason for hiding this comment

tannergooding Nov 21, 2024

Choose a reason for hiding this comment

xtqqczze Nov 23, 2024

Choose a reason for hiding this comment

xtqqczze commented Jul 10, 2024

xtqqczze commented Jul 10, 2024

xtqqczze commented Jul 11, 2024 • edited Loading

System.Text.Ascii:IsValidCore[ubyte](byref,int):ubyte

tannergooding commented Jul 11, 2024

xtqqczze commented Jul 11, 2024 • edited Loading

xtqqczze commented Jul 12, 2024 • edited Loading

xtqqczze commented Jul 13, 2024

xtqqczze commented Jul 13, 2024 • edited Loading

xtqqczze commented Jul 13, 2024 • edited Loading

tannergooding commented Jul 13, 2024

xtqqczze commented Jul 16, 2024 • edited Loading

tannergooding commented Jul 17, 2024

xtqqczze commented Jul 17, 2024

Replace use of target dependent `TestZ` intrinsic #104488

Replace use of target dependent `TestZ` intrinsic #104488

xtqqczze commented Jul 5, 2024 •

edited

Loading

xtqqczze commented Jul 11, 2024 •

edited

Loading

`System.Text.Ascii:IsValidCore[ubyte](byref,int):ubyte`

xtqqczze commented Jul 11, 2024 •

edited

Loading

xtqqczze commented Jul 12, 2024 •

edited

Loading

xtqqczze commented Jul 13, 2024 •

edited

Loading

xtqqczze commented Jul 13, 2024 •

edited

Loading

xtqqczze commented Jul 16, 2024 •

edited

Loading