-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mono][jit] Adding more arm64 SIMD operations, SIMD codegen with instruction table. #83094
Conversation
…ew others, tabular approach to instruciton specs.
…chine code transformation is now table-generated for select operations. Fixing issues.
CI failures on |
This makes the code a little bit hard to read. It might be better to only do it for only a subset like the OP_XBINOP ones. |
@vargaz Which part are you referring to? (the comment has no line number attached) |
The simd opcodes are not regular enough for them to be generated from a table imho. The OP_XBINOP opcodes are easier to handle since they each map to an llvm intrinsics, and the llvm intrinsics map to SIMD instructions. |
This might just be due to the way Mono is currently setup. Overall the data is incredibly regular and we need very little special handling in RyuJIT. This is true for both x86/x64 and Arm64. Arm64 is notably more regular than x64 is, simply due to all the intrinsics being introduced in the same ISA, and so the represented tables are more dense and less sparse. |
The table effectively unpacks into two tables (one where the
Do you have any other ideas? @vargaz @tannergooding @fanyang-mono |
For RyuJIT, we effectively have one node type to represent all hardware intrinsics: This was done specifically to help with the general table driven approach and in general specialized handling that SIMD nodes may require as compared to other nodes. The
This is in addition to "standard" things that all nodes track, like return type. There are also a couple special flags/fields that have multiple uses based on the intrinsic ID for the very few cases that do need specialized handling. We then have one nice big table for each architecture:
This table tracks the ISA and function name, which combined form the Intrinsic ID (such as We then track a category, which is used to help decide which of the table driven paths an API goes down. Most of these are We then also track some flags that also help drive the compilation handling in general. For Arm64, we have roughly 461 intrinsics, of those:
Since each of these intrinsics has 10 types it needs to consider support for, there are a total of 4610 instruction entries tracked. The majority of this is "sparse" at 2997 entries. The reason for this level of sparseness is primarily because there are many APIs which are only valid for Notably there are also ~195 xplat APIs for From all of this, we are able to generally table drive each phase of the JIT. We are also able to broadly share a large amount of logic between the x86, x64, and Arm64 intrinsic support. This is applicable to:
Outside of defining the tables and the HWIntrinsicInfo struct they map to, we aren't using macros or other things to simplify code. We just map the ID to the Asserts exist to help ensure that anything that needs to be changed when new functionality is added gets handled. |
Thanks, @tannergooding , that will be useful. |
…ted to exclude certain operations that are easily implemented manually.
After discussion with @vargaz, the table was modified to only encode the operations which are hidden under umbrella |
Transformation of Mono IR to machine code is now automated with a table in
simd-arm64.h
. The table format should be general enough also for x86(-64). Opcodes not listed in the table go through the preexisting process. Only arm64 instructions are considered right now.Adding comparison operations, negation, ones complement.
Contributes to #80566 and #43051.
Note: before merging the final PR, disable SIMD until all intrinsics are implemented (as in 31a9c6c)