JIT: Cast UInt64 to Single directly during const folding #106419

amanasifkhalid · 2024-08-14T20:26:01Z

amanasifkhalid · 2024-08-14T22:19:36Z

@dotnet/jit-contrib PTAL. The new test is failing on mono:

Xunit.Sdk.EqualException: Assert.Equal() Failure: Values differ
Expected: 1600094603
Actual:   1600094604

So this might need to be fixed there, as well? I can make this test CoreCLR-only for now.

SPMI is failing because it doesn't have any collections to run.

tannergooding · 2024-08-14T22:33:51Z

I can make this test CoreCLR-only for now.

This is the right thing for .NET 9; Mono has a tracking issue to make the behavior changes and standardize to the same implementation as RyuJIT: #100368

amanasifkhalid · 2024-08-14T22:35:07Z

This is the right thing for .NET 9; Mono has a tracking issue to make the behavior changes and standardize to the same implementation as RyuJIT: #100368

Got it. I'll let this CI run finish, and then push a change to disable it on Mono.

amanasifkhalid · 2024-08-14T22:45:12Z

Looks like the test is running on non-ARM64 legs, too -- probably because I forgot <RequiresProcessIsolation>, which is needed for <CLRTestTargetUnsupported>.

amanasifkhalid · 2024-08-15T16:34:26Z

@dotnet/jit-contrib PTAL, the new test is passing.

tannergooding · 2024-08-15T18:38:23Z

src/coreclr/jit/utils.cpp

+#ifdef TARGET_ARM64
+    // ARM64 supports casting directly to float
+    return (float)u64;
+#else  // !TARGET_ARM64
    double d = convertUInt64ToDouble(u64);
    return (float)d;
+#endif // !TARGET_ARM64


I'm pretty sure this bug exists on all platforms, not just Arm64.

The code in the test is

ulong vr10 = 16105307123914158031UL; float vr11 = 4294967295U | vr10;

This boils down to 4294967295U | 16105307123914158031UL which is 16105307124325679103

The correct float result is then 16105306574569865216.0f, or rather the raw bits 1600094603 (the DEBUG output).

The result produced by the two step conversion, however, is 16105307674081492992.0f, or rather the raw bits 1600094604 (the RELEASE output).

I would expect that all our current compilers (MSVC, Clang, and GCC) are producing correct results for return (float)u64 and we no longer need to do a two step conversion. If any are still producing incorrect results, we should log bugs against them and we'd need to hand roll an implementation that does the conversion instead (I can give reference to one if we need it, but I don't think we do).

Thanks for pointing this out. I tried both the one- and two-step conversion on x64, and in Debug and Release, I get 1600094604. So it looks like the MSVC/Clang/GCC codegen on the const-folding path is matching what RyuJIT currently emits in Debug for x64. For this test case, is the definition of "correct" architecture-dependent? I'm guessing there's some subtle difference in rounding behavior between ARM64 and x64?

Regardless of the answer to that, I think you're right that we can update the cast logic for all platforms. I can do that in this PR.

I tried both the one- and two-step conversion on x64, and in Debug and Release, I get 1600094604

Are you sure? I don't see the same behavior in godbolt: https://godbolt.org/z/s5MrTWaYM
-- note that m1 and m2 return the same result (raw bits 0x5f5f818b) while m3 returns one higher (raw bits 0x5f5f818c)

For this test case, is the definition of "correct" architecture-dependent?

No, we should be deterministic across all platforms for such a case; and correspondingly the added test should be enabled for all platforms as well, not just be Arm64 specific

It turns out the ulong -> float cast on x64 is represented in IR as ulong -> double -> float, while on ARM64, we combine it into one cast during morph. I'm guessing we have to enable that path for all platforms, too.

So the importer is creating ulong -> double -> float for ulong -> float conversions in IL? That sounds wrong given that those are not equivalent (as the example shows). If morph is making that transformation in the opposite direction it likewise sounds wrong.

Roslyn is likely relying on the JIT implementation for the constant folding here, not realizing it's incorrect.

Yeah, I agree the helper has downsides too. But Roslyn could avoid using it when not targeting something with it available. For example, I imagine it could emit the equivalent of

ulong x = ...; float f = x > long.MaxValue ? (float)(-x) + (float)0x8000000000000000 : (float)(long)x;

in those cases (or whatever the right way is to do the conversion manually). Signed long -> float conversion is representable with just conv.r4, I believe.

I don't think we need to try to fix this in .NET 9, but we may want to unify the debug/release behavior on something. We should probably open an issue for more discussion about the problem.

I don't think we need to try to fix this in .NET 9, but we may want to unify the debug/release behavior on something. We should probably open an issue for more discussion about the problem.

I've opened #106646 to track this. I think ARM64 was the only platform where Debug/Release behavior could diverge, due to const-folding always doing a two-step cast and ucvtf/scvtf being able to encode ulong/long -> float casts. Should we take this PR as-is for .NET 9?

I expect this also repros on AVX512 capable hardware for x64.

You're right, I can repro it on my AVX512 machine, though this PR's changes seem to fix it. Here's the Debug codegen:

G_M27646_IG01: ;; offset=0x0000 push rbp sub rsp, 48 lea rbp, [rsp+0x30] xor eax, eax mov qword ptr [rbp-0x08], rax mov dword ptr [rbp-0x0C], eax ;; size=19 bbWeight=1 PerfScore 4.00 G_M27646_IG02: ;; offset=0x0013 cmp dword ptr [(reloc 0x7ffeb59917c0)], 0 je SHORT G_M27646_IG04 ;; size=9 bbWeight=1 PerfScore 4.00 G_M27646_IG03: ;; offset=0x001C call CORINFO_HELP_DBG_IS_JUST_MY_CODE ;; size=5 bbWeight=0.50 PerfScore 0.50 G_M27646_IG04: ;; offset=0x0021 nop mov rax, 0xDF818B7FE778AFCF mov qword ptr [rbp-0x08], rax mov eax, -1 mov eax, eax or rax, qword ptr [rbp-0x08] vcvtusi2ss xmm0, rax vmovss dword ptr [rbp-0x0C], xmm0 vmovss xmm0, dword ptr [rbp-0x0C] call [System.BitConverter:SingleToUInt32Bits(float):uint] mov dword ptr [rbp-0x10], eax mov ecx, dword ptr [rbp-0x10] call [System.Console:WriteLine(uint)] nop nop ;; size=62 bbWeight=1 PerfScore 22.50 G_M27646_IG05: ;; offset=0x005F add rsp, 48 pop rbp ret ;; size=6 bbWeight=1 PerfScore 1.75

And here's Release:

G_M27646_IG01: ;; offset=0x0000 sub rsp, 40 ;; size=4 bbWeight=1 PerfScore 0.25 G_M27646_IG02: ;; offset=0x0004 mov ecx, 0x5F5F818B call [System.Console:WriteLine(uint)] nop ;; size=12 bbWeight=1 PerfScore 3.50 G_M27646_IG03: ;; offset=0x0010 add rsp, 40 ret ;; size=5 bbWeight=1 PerfScore 1.25

Both now output 1600094603. I can try tweaking the test to only run on ARM64, or on x64 if Avx512.IsSupported is true.

tannergooding · 2024-08-19T19:49:07Z

src/tests/JIT/Regression/JitBlue/Runtime_106338/Runtime_106338.cs

+        bool runTest = (RuntimeInformation.ProcessArchitecture == Architecture.Arm64) || Avx512F.IsSupported;
+
+        if (runTest)
+        {
+            ulong vr10 = 16105307123914158031UL;
+            float vr11 = 4294967295U | vr10;
+            Assert.Equal(1600094603U, BitConverter.SingleToUInt32Bits(vr11));
+        }


Should the test rather assert that its producing 1600094604u if (RuntimeInformation.ProcessArchitecture == Architecture.Arm64) || Avx512F.IsSupported is false?

That way we can detect any changes for other platforms or scenarios?

Good idea; I'll update it.

I just remembered that we don't build this test if the target arch isn't arm64/x64 (or if the runtime isn't CoreCLR). @tannergooding are you ok with only testing those platforms for now?

I think that’s fine, but ideally we’d ensure this runs everywhere long term

tannergooding · 2024-08-20T14:32:26Z

src/tests/JIT/Regression/JitBlue/Runtime_106338/Runtime_106338.csproj

+    <CLRTestTargetUnsupported Condition="'$(TargetArchitecture)' != 'arm64' AND '$(TargetArchitecture)' != 'x64'">true</CLRTestTargetUnsupported>
+    <CLRTestTargetUnsupported Condition="'$(RuntimeFlavor)' != 'coreclr'">true</CLRTestTargetUnsupported>


The only need is we skip this on Mono today, right?
Is that because they're always doing the ulong -> double -> float 2-step behavior?

Can we not instead use [SkipOnMono("https://github.com/dotnet/runtime/issues/#######", TestPlatforms.Any)] and ensure that the test is compiled for all platforms?

Yes, Mono seems to also do the two-step cast, so the test was failing across all Mono legs.

Can we not instead use [SkipOnMono("https://github.com/dotnet/runtime/issues/#######", TestPlatforms.Any)] and ensure that the test is compiled for all platforms?

Sure, I'll update it.

amanasifkhalid · 2024-08-20T16:56:44Z

On x86, we're pretty explicit about keeping the cast two steps in morph: We represent the cast with a helper call to convert the ulong to double, followed by a cast to float. The test's condition needs to explicitly check if the target is an x64 machine with AVX-512 so we don't accidentally go down this path on x86.

amanasifkhalid · 2024-08-20T18:50:44Z

Test is now passing on all CoreCLR Pri0 legs.

amanasifkhalid · 2024-08-20T18:52:42Z

/backport to release/9.0

github-actions · 2024-08-20T18:52:53Z

Started backporting to release/9.0: https://github.com/dotnet/runtime/actions/runs/10477490137

amanasifkhalid added 2 commits August 14, 2024 16:19

Fix cast folding on ARM64

ceb81bf

Add test

d729037

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 14, 2024

amanasifkhalid added this to the 9.0.0 milestone Aug 14, 2024

dotnet-policy-service bot assigned amanasifkhalid Aug 14, 2024

amanasifkhalid added 5 commits August 14, 2024 16:29

Fix weird comment spacing

0a36998

Fix test

a740f04

Fix test

c98af0a

Flip params

08370ac

Run test on CoreCLR only

70c60b3

Add RequiresProcessIsolation

8dd3473

build-analysis bot mentioned this pull request Aug 15, 2024

Checkout failure: "Git fetch failed with exit code 128" dotnet/arcade#9009

Open

2 tasks

amanasifkhalid closed this Aug 15, 2024

amanasifkhalid reopened this Aug 15, 2024

Merge branch 'main' into fp-cast-arm64

163d581

build-analysis bot mentioned this pull request Aug 15, 2024

nativeaot/SmokeTests/Exceptions failing with Assertion failed: (n_heaps <= heap_number) || !gc_t_join.joined() #103839

Closed

tannergooding reviewed Aug 15, 2024

View reviewed changes

Do single-step conversion on all platforms

3900e5d

amanasifkhalid changed the title ~~JIT: Cast UInt64 to Single directly on ARM64 during const folding~~ JIT: Cast UInt64 to Single directly during const folding Aug 15, 2024

amanasifkhalid mentioned this pull request Aug 19, 2024

JIT: Model Int64/UInt64 -> Single casts without intermediate cast to Double #106646

Open

Expand test coverage

de0d0c9

tannergooding reviewed Aug 19, 2024

View reviewed changes

Update test

08fd38c

This was referenced Aug 19, 2024

Test failure "RemoteExecutionException : Half-way through waiting for remote process." in System.Threading.ThreadPools.Tests.ThreadPoolTests.IOCompletionPortCountConfigVarTest #106494

Closed

'chrome-GetPropertiesTests' timing out #106625

Closed

tannergooding reviewed Aug 20, 2024

View reviewed changes

tannergooding approved these changes Aug 20, 2024

View reviewed changes

Try running test on all CoreCLR platforms

6ee12fc

Fix condition

0ea8525

amanasifkhalid merged commit 7a50b43 into dotnet:main Aug 20, 2024
105 of 114 checks passed

amanasifkhalid deleted the fp-cast-arm64 branch August 20, 2024 18:51

github-actions bot mentioned this pull request Aug 20, 2024

[release/9.0] JIT: Cast UInt64 to Single directly during const folding #106720

Merged

4 tasks

build-analysis bot mentioned this pull request Aug 20, 2024

tracing/runtimeeventsource/nativeruntimeeventsource/nativeruntimeeventsource failing in CI #90605

Open

github-actions bot locked and limited conversation to collaborators Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Cast UInt64 to Single directly during const folding #106419

JIT: Cast UInt64 to Single directly during const folding #106419

amanasifkhalid commented Aug 14, 2024

amanasifkhalid commented Aug 14, 2024 •

edited

Loading

tannergooding commented Aug 14, 2024

amanasifkhalid commented Aug 14, 2024

amanasifkhalid commented Aug 14, 2024

amanasifkhalid commented Aug 15, 2024

tannergooding Aug 15, 2024

amanasifkhalid Aug 15, 2024 •

edited

Loading

tannergooding Aug 15, 2024

amanasifkhalid Aug 15, 2024

jakobbotsch Aug 15, 2024 •

edited

Loading

tannergooding Aug 16, 2024 •

edited

Loading

jakobbotsch Aug 19, 2024

amanasifkhalid Aug 19, 2024

tannergooding Aug 19, 2024

amanasifkhalid Aug 19, 2024

tannergooding Aug 19, 2024

amanasifkhalid Aug 19, 2024

amanasifkhalid Aug 19, 2024

tannergooding Aug 19, 2024

tannergooding Aug 20, 2024

amanasifkhalid Aug 20, 2024

amanasifkhalid commented Aug 20, 2024

amanasifkhalid commented Aug 20, 2024 •

edited

Loading

amanasifkhalid commented Aug 20, 2024

github-actions bot commented Aug 20, 2024

		<CLRTestTargetUnsupported Condition="'$(TargetArchitecture)' != 'arm64' AND '$(TargetArchitecture)' != 'x64'">true</CLRTestTargetUnsupported>
		<CLRTestTargetUnsupported Condition="'$(RuntimeFlavor)' != 'coreclr'">true</CLRTestTargetUnsupported>

JIT: Cast UInt64 to Single directly during const folding #106419

JIT: Cast UInt64 to Single directly during const folding #106419

Conversation

amanasifkhalid commented Aug 14, 2024

amanasifkhalid commented Aug 14, 2024 • edited Loading

tannergooding commented Aug 14, 2024

amanasifkhalid commented Aug 14, 2024

amanasifkhalid commented Aug 14, 2024

amanasifkhalid commented Aug 15, 2024

Choose a reason for hiding this comment

amanasifkhalid Aug 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakobbotsch Aug 15, 2024 • edited Loading

Choose a reason for hiding this comment

tannergooding Aug 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amanasifkhalid commented Aug 20, 2024

amanasifkhalid commented Aug 20, 2024 • edited Loading

amanasifkhalid commented Aug 20, 2024

github-actions bot commented Aug 20, 2024

amanasifkhalid commented Aug 14, 2024 •

edited

Loading

amanasifkhalid Aug 15, 2024 •

edited

Loading

jakobbotsch Aug 15, 2024 •

edited

Loading

tannergooding Aug 16, 2024 •

edited

Loading

amanasifkhalid commented Aug 20, 2024 •

edited

Loading