Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono][jit] Adding support for Vector128::ExtractMostSignificantBits intrinsic on ARM64 with miniJIT #84345

Merged
merged 2 commits into from
Apr 14, 2023

Conversation

ivanpovazan
Copy link
Member

@ivanpovazan ivanpovazan commented Apr 5, 2023

This PR adds support for ExtractMostSignificantBits intrinsic on ARM64 miniJIT.

Implementation

On ARM64, there is no single instruction to perform the operation.
To emulate the behavior we implement a similar approach to coreCLR by performing the following set of operations:

  • The general case (all element types except byte/sbyte):
  1. Extract the most significant bit of each element of the source vector by ANDing elements with MSB mask
  2. Shift each element to the right by its index
  3. Sum all the element values together tp get the final result of ExtractMSB
  • Handling byte/sbyte element types:
  1. Extract the most significant bit of each element of the source vector by ANDing elements with MSB mask
  2. Shift each element to the right by its relative index to the high/low (64bit) boundary.
    • This basically means we treat the source 128bit, as two 64bit (high and low) vectors and shift their elements respectively
  3. Sum the low 64bits of the shifted vector
  4. Sum the high 64bits of the shifted vector
  5. Shift left the result of 4) by 8, to adjust the value
  6. OR the results from 3) and 5) to get the final result of ExtractMSB

New opcodes

Two new opcodes added:

  1. OP_ARM64_USHL - Shifts a vector left or right depending on the sign of the shift constant - USHL vector
  2. OP_ARM64_EXT_IMM - Extracts a vector from pair of vectors based on constant selector value - the lowest numbered byte element to be extracted in the range. EXT vector
    • Should not be confused with OP_ARM64_EXT which uses 3 source registers expecting the selector value to be in a register.

Further optimizations

The current implementation can be further optimized. The following approaches could be examined:

  1. On ARM64 for each ExtractMostSignificantBits we generate inlined constants into the instruction stream.
    As the mask and shift constant are the same for a specific element type and can be reused, we could instead allocate them in a shared location to be reused between ExtractMostSignificantBits operations.
    Additionally, MSB masking could reuse the approach outlined here: [mono][jit] Adding support for Vector128::ExtractMostSignificantBits intrinsic on ARM64 with miniJIT #84345 (comment)
  2. On ARM64 when we handle byte and sbyte element types we use 9 instructions to treat upper and lower half of the vector properly. This can be improved once we support addv.8b as we would be able to just sum the lower 64bits of the source vector and not need to clear-out (ext with zero) the upper half of it before summing the full length. This would save us 1 instruction. Here is the summary of how currently the pseudo and generated code for the sequence looks like:
  • load 0 into a zero_vec
  • extract src_low and zero_high into low_value_vec
  • sum the low_value_vec and move it to tmp1
  • extract src_vec_high and zero_vec_low into high_value_vec
  • sum the high_value_vec and move it to tmp2
  • left shift tmp2 by 8
  • or tmp1 and tmp2 to for the final sum
    Example:
    eor.16b	v1, v1, v1
    ext.16b	v2, v1, v0, #0x8
    addv.16b	b2, v2
    umov.b	w1, v2[0]
    ext.16b	v0, v0, v1, #0x8
    addv.16b	b0, v0
    umov.b	w0, v0[0]
    lsl	x0, x0, #8
    orr	w0, w0, w1
  1. Implement with mono: Improve Vector128.ExtractMostSignificantBits for arm64 #76047

Contributes to #76025

/cc: @jandupej @fanyang-mono

@ivanpovazan ivanpovazan changed the title WIP: [mono][jit] Adding support for Vector128::ExtractMostSignificantBits intrinsic on ARM64 with miniJIT [mono][jit] Adding support for Vector128::ExtractMostSignificantBits intrinsic on ARM64 with miniJIT Apr 5, 2023
Copy link
Member

@fanyang-mono fanyang-mono left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I would suggest removing most of the comments, as the code should be clear enough.

src/mono/mono/mini/mini-arm64.c Show resolved Hide resolved
src/mono/mono/mini/simd-intrinsics.c Outdated Show resolved Hide resolved
src/mono/mono/mini/simd-intrinsics.c Outdated Show resolved Hide resolved
src/mono/mono/mini/simd-intrinsics.c Outdated Show resolved Hide resolved
src/mono/mono/mini/simd-intrinsics.c Show resolved Hide resolved
src/mono/mono/mini/simd-intrinsics.c Show resolved Hide resolved
@ivanpovazan
Copy link
Member Author

Failures are unrelated and known:

  • Build browser-wasm linux Release WasmBuildTests is failing with:
Wasm.Build.Tests.Blazor.MiscTests.NativeBuild_WithDeployOnBuild_UsedByVS(config: "Release", nativeRelink: False) [FAIL]
       Expected 0 exit code but got 1: /root/helix/work/workitem/e/dotnet-latest/dotnet build -bl:/root/helix/work/workitem/uploads/xharness-output/logs/blz_deploy_on_build_Release_False_zxdky5y5.int/blz_deploy_on_build_Release_False_zxdky5y5.int-build.binlog -p:Configuration=Release  -p:BlazorEnableCompression=false -nr:false  -p:DeployOnBuild=true 
      Standard Output:
      [] MSBuild version 17.7.0-preview-23206-02+171676d81 for .NET
      []   Determining projects to restore...
      []   Restored /root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/blz_deploy_on_build_Release_False_zxdky5y5.int.csproj (in 1.57 sec).
      [] /root/helix/work/workitem/e/dotnet-latest/sdk/8.0.100-preview.4.23213.40/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.RuntimeIdentifierInference.targets(287,5): message NETSDK1057: You are using a preview version of .NET. See: https://aka.ms/dotnet-support-policy [/root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/blz_deploy_on_build_Release_False_zxdky5y5.int.csproj]
      [] /root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/Program.cs(3,50): error CS1001: Identifier expected [/root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/blz_deploy_on_build_Release_False_zxdky5y5.int.csproj]
      [] /root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/Program.cs(3,50): error CS1002: ; expected [/root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/blz_deploy_on_build_Release_False_zxdky5y5.int.csproj]
      [] /root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/Program.cs(3,53): error CS1001: Identifier expected [/root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/blz_deploy_on_build_Release_False_zxdky5y5.int.csproj]
      [] 
      [] Build FAILED.
      [] 
      [] /root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/Program.cs(3,50): error CS1001: Identifier expected [/root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/blz_deploy_on_build_Release_False_zxdky5y5.int.csproj]
      [] /root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/Program.cs(3,50): error CS1002: ; expected [/root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/blz_deploy_on_build_Release_False_zxdky5y5.int.csproj]
      [] /root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/Program.cs(3,53): error CS1001: Identifier expected [/root/helix/work/workitem/e/wbt/blz_deploy_on_build_Release_False_zxdky5y5.int/blz_deploy_on_build_Release_False_zxdky5y5.int.csproj]
      []     0 Warning(s)
      []     3 Error(s)
      [] 
      [] Time Elapsed 00:00:03.33

tracked here: #84722

  • Build browser-wasm windows Release Mono_DebuggerTests_chrome fails with:
...
[EXECUTION TIMED OUT]
Exit Code:-3Executor timed out after 3000 seconds and was killed

['chrome-DebuggerTests.EvaluateOnCallFrameTests' END OF WORK ITEM LOG: Command timed out, and was killed]

tracked here: #84434

@ivanpovazan ivanpovazan merged commit ad994e5 into dotnet:main Apr 14, 2023
@ghost ghost locked as resolved and limited conversation to collaborators May 14, 2023
@ivanpovazan ivanpovazan deleted the extract-msb-arm64 branch August 15, 2023 09:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants