[mono][jit] Adding support for Vector128::ExtractMostSignificantBits intrinsic on ARM64 with miniJIT #84345

ivanpovazan · 2023-04-05T12:08:35Z

This PR adds support for ExtractMostSignificantBits intrinsic on ARM64 miniJIT.

Implementation

On ARM64, there is no single instruction to perform the operation.
To emulate the behavior we implement a similar approach to coreCLR by performing the following set of operations:

The general case (all element types except byte/sbyte):

Extract the most significant bit of each element of the source vector by ANDing elements with MSB mask
Shift each element to the right by its index
Sum all the element values together tp get the final result of ExtractMSB

Handling byte/sbyte element types:

Extract the most significant bit of each element of the source vector by ANDing elements with MSB mask
Shift each element to the right by its relative index to the high/low (64bit) boundary.
- This basically means we treat the source 128bit, as two 64bit (high and low) vectors and shift their elements respectively
Sum the low 64bits of the shifted vector
Sum the high 64bits of the shifted vector
Shift left the result of 4) by 8, to adjust the value
OR the results from 3) and 5) to get the final result of ExtractMSB

New opcodes

Two new opcodes added:

OP_ARM64_USHL - Shifts a vector left or right depending on the sign of the shift constant - USHL vector
OP_ARM64_EXT_IMM - Extracts a vector from pair of vectors based on constant selector value - the lowest numbered byte element to be extracted in the range. EXT vector
- Should not be confused with OP_ARM64_EXT which uses 3 source registers expecting the selector value to be in a register.

Further optimizations

The current implementation can be further optimized. The following approaches could be examined:

On ARM64 for each ExtractMostSignificantBits we generate inlined constants into the instruction stream.
As the mask and shift constant are the same for a specific element type and can be reused, we could instead allocate them in a shared location to be reused between ExtractMostSignificantBits operations.
Additionally, MSB masking could reuse the approach outlined here: [mono][jit] Adding support for Vector128::ExtractMostSignificantBits intrinsic on ARM64 with miniJIT #84345 (comment)
On ARM64 when we handle byte and sbyte element types we use 9 instructions to treat upper and lower half of the vector properly. This can be improved once we support addv.8b as we would be able to just sum the lower 64bits of the source vector and not need to clear-out (ext with zero) the upper half of it before summing the full length. This would save us 1 instruction. Here is the summary of how currently the pseudo and generated code for the sequence looks like:

load 0 into a zero_vec
extract src_low and zero_high into low_value_vec
sum the low_value_vec and move it to tmp1
extract src_vec_high and zero_vec_low into high_value_vec
sum the high_value_vec and move it to tmp2
left shift tmp2 by 8
or tmp1 and tmp2 to for the final sum
Example:

    eor.16b	v1, v1, v1
    ext.16b	v2, v1, v0, #0x8
    addv.16b	b2, v2
    umov.b	w1, v2[0]
    ext.16b	v0, v0, v1, #0x8
    addv.16b	b0, v0
    umov.b	w0, v0[0]
    lsl	x0, x0, #8
    orr	w0, w0, w1

Implement with mono: Improve Vector128.ExtractMostSignificantBits for arm64 #76047

Contributes to #76025

/cc: @jandupej @fanyang-mono

src/mono/mono/mini/simd-intrinsics.c

src/mono/mono/arch/arm64/arm64-codegen.h

src/mono/mono/mini/mini-ops.h

…intrinsic on ARM64 miniJIT

src/mono/mono/mini/simd-intrinsics.c

src/mono/mono/mini/mini-arm64.c

fanyang-mono

Overall, I would suggest removing most of the comments, as the code should be clear enough.

src/mono/mono/mini/mini-arm64.c

src/mono/mono/mini/simd-intrinsics.c

ivanpovazan · 2023-04-14T15:58:24Z

Failures are unrelated and known:

Build browser-wasm linux Release WasmBuildTests is failing with:

Wasm.Build.Tests.Blazor.MiscTests.NativeBuild_WithDeployOnBuild_UsedByVS(config: "Release", nativeRelink: False) [FAIL]
       Expected 0 exit code but got 1: /root/helix/work/workitem/e/dotnet-latest/dotnet build -bl:/root/helix/work/workitem/uploads/xharness-output/logs/blz_deploy_on_build_Release_False_zxdky5y5.int/blz_deploy_on_build_Release_False_zxdky5y5.int-build.binlog -p:Configuration=Release  -p:BlazorEnableCompression=false -nr:false  -p:DeployOnBuild=true 
      Standard Output:
      [] MSBuild version 17.7.0-preview-23206-02+171676d81 for .NET
      []   Determining projects to restore...
      []   Restored /root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/blz_deploy_on_build_Release_False_zxdky5y5.int.csproj (in 1.57 sec).
      [] /root/helix/work/workitem/e/dotnet-latest/sdk/8.0.100-preview.4.23213.40/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.RuntimeIdentifierInference.targets(287,5): message NETSDK1057: You are using a preview version of .NET. See: https://aka.ms/dotnet-support-policy [/root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/blz_deploy_on_build_Release_False_zxdky5y5.int.csproj]
      [] /root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/Program.cs(3,50): error CS1001: Identifier expected [/root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/blz_deploy_on_build_Release_False_zxdky5y5.int.csproj]
      [] /root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/Program.cs(3,50): error CS1002: ; expected [/root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/blz_deploy_on_build_Release_False_zxdky5y5.int.csproj]
      [] /root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/Program.cs(3,53): error CS1001: Identifier expected [/root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/blz_deploy_on_build_Release_False_zxdky5y5.int.csproj]
      [] 
      [] Build FAILED.
      [] 
      [] /root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/Program.cs(3,50): error CS1001: Identifier expected [/root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/blz_deploy_on_build_Release_False_zxdky5y5.int.csproj]
      [] /root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/Program.cs(3,50): error CS1002: ; expected [/root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/blz_deploy_on_build_Release_False_zxdky5y5.int.csproj]
      [] /root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/Program.cs(3,53): error CS1001: Identifier expected [/root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/blz_deploy_on_build_Release_False_zxdky5y5.int.csproj]
      []     0 Warning(s)
      []     3 Error(s)
      [] 
      [] Time Elapsed 00:00:03.33

tracked here: #84722

Build browser-wasm windows Release Mono_DebuggerTests_chrome fails with:

...
[EXECUTION TIMED OUT]
Exit Code:-3Executor timed out after 3000 seconds and was killed

['chrome-DebuggerTests.EvaluateOnCallFrameTests' END OF WORK ITEM LOG: Command timed out, and was killed]

tracked here: #84434

ivanpovazan self-assigned this Apr 5, 2023

dotnet-issue-labeler bot added the area-Codegen-JIT-mono label Apr 5, 2023

ivanpovazan commented Apr 5, 2023

View reviewed changes

src/mono/mono/mini/simd-intrinsics.c Outdated Show resolved Hide resolved

jandupej reviewed Apr 5, 2023

View reviewed changes

src/mono/mono/mini/simd-intrinsics.c Show resolved Hide resolved

src/mono/mono/arch/arm64/arm64-codegen.h Outdated Show resolved Hide resolved

ivanpovazan changed the title ~~WIP: [mono][jit] Adding support for Vector128::ExtractMostSignificantBits intrinsic on ARM64 with miniJIT~~ [mono][jit] Adding support for Vector128::ExtractMostSignificantBits intrinsic on ARM64 with miniJIT Apr 5, 2023

ivanpovazan commented Apr 5, 2023

View reviewed changes

src/mono/mono/mini/mini-ops.h Show resolved Hide resolved

ivanpovazan marked this pull request as ready for review April 5, 2023 17:51

ivanpovazan requested review from vargaz, lambdageek and SamMonoRT as code owners April 5, 2023 17:51

ivanpovazan requested a review from jandupej April 5, 2023 17:52

build-analysis bot mentioned this pull request Apr 6, 2023

StackallocTests.Test4096 failing EnsureZeroed check #84398

Open

ivanpovazan requested a review from fanyang-mono April 11, 2023 12:58

vargaz approved these changes Apr 12, 2023

View reviewed changes

SamMonoRT requested a review from tannergooding April 13, 2023 12:36

[mono][jit] Adding support for Vector128::ExtractMostSignificantBits …

01e7f7a

…intrinsic on ARM64 miniJIT

ivanpovazan force-pushed the extract-msb-arm64 branch from eca2d83 to 01e7f7a Compare April 13, 2023 14:32

ivanpovazan commented Apr 13, 2023

View reviewed changes

src/mono/mono/mini/simd-intrinsics.c Show resolved Hide resolved

ivanpovazan commented Apr 13, 2023

View reviewed changes

src/mono/mono/mini/mini-arm64.c Show resolved Hide resolved

fanyang-mono reviewed Apr 13, 2023

View reviewed changes

build-analysis bot mentioned this pull request Apr 13, 2023

WasmTestOnBrowser-System.Text.Json.Tests.WorkItemExecution timing out #84434

Closed

PR review feedback

74f683d

fanyang-mono approved these changes Apr 14, 2023

View reviewed changes

ivanpovazan merged commit ad994e5 into dotnet:main Apr 14, 2023

ghost locked as resolved and limited conversation to collaborators May 14, 2023

ivanpovazan deleted the extract-msb-arm64 branch August 15, 2023 09:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mono][jit] Adding support for Vector128::ExtractMostSignificantBits intrinsic on ARM64 with miniJIT #84345

[mono][jit] Adding support for Vector128::ExtractMostSignificantBits intrinsic on ARM64 with miniJIT #84345

ivanpovazan commented Apr 5, 2023 •

edited

Loading

fanyang-mono left a comment

ivanpovazan commented Apr 14, 2023

[mono][jit] Adding support for Vector128::ExtractMostSignificantBits intrinsic on ARM64 with miniJIT #84345

[mono][jit] Adding support for Vector128::ExtractMostSignificantBits intrinsic on ARM64 with miniJIT #84345

Conversation

ivanpovazan commented Apr 5, 2023 • edited Loading

Implementation

New opcodes

Further optimizations

fanyang-mono left a comment

Choose a reason for hiding this comment

ivanpovazan commented Apr 14, 2023

ivanpovazan commented Apr 5, 2023 •

edited

Loading