-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fold aarch64 feature +fp into +neon #91608
Conversation
cc @adamgemmell |
This comment has been minimized.
This comment has been minimized.
Oh interesting, I'd need to fix those too. nbd, assuming this is a fine direction. |
I don't really see the motivation for this. While it's true that you'll never find an actual CPU with only one FP or NEON, these are still considered separate features in the ISA and that is what we are following for feature names. You can't end up with an invalid feature combination either since LLVM defines the |
This is informed by any other combination being specifically ruled out several times, and because users in the past have tried to use incorrect feature combinations for various architectures. It is not about whether or not LLVM will necessarily miscompile the code. It is about what users expect. They do not necessarily intuit such a nuance that on the ISA level, floating point requires the vector registers, and the vector registers require floating point. If they see two independent features, they are naturally inclined to conclude that they have independent meaning. For AArch64, they do not, and it is doubtful they ever will. By melding the two into one, we reflect this reality, and simplify actual usage of the compiler. There is no "necessary complexity" this would be omitting: the main reason the two features here are split is a legacy detail from being based on Armv8-A, which is a red herring when the execution mode, emitted code, and management of architectural state for A64 is different. I wish to make things simpler for users so issues like https://project-oak.github.io/rust-verification-tools/2021/05/15/verifying-vectorized-code2.html and other related issues do not come up again. And at the moment, |
Sorry, I've been ill the last couple of days.
Not really relevant but bits of the rust compiler use the LLVM name |
I have been silent because I worry there is no clear way to express what I want that is not simply repeating myself, and it is hard to express my concerns easily. I worry that I am wrong. I worry that I will go too far if I elaborate too much. I try nonetheless: Briefly, as to cleanness: The problem is that the Arm architecture isn't clean either. There's no machine instructions defined by Arm to only live in FEAT_FP without FEAT_ASIMD in the A64 machine language. They are of the "same type". Thus, forcing And there is no architectural state that lives only in FEAT_FP without FEAT_ASIMD either, as far as I am aware. When someone wants to disable the vector registers in a code segment, they usually want to do it because they want to avoid clobbering vector state. This is a concern throughout the Linux kernel. But any floating point operations also clobber vector state, inherently. As @jacobbramley put it, this is undefined behavior., and LLVM offers no coherent compilation contract that I am aware of to rationalize this combination. I worry that allowing this to stand makes resolving the current issues of |
I would like to keep the actual features separate since they are separate in the ISA spec. However, the compiler could enforce that these two features are enabled/disabled together to disallow invalid combinations. This was done in #93782 for the |
Actually, that's split in the CPU architecture in that there are multiple ID registers describing what, exactly is implemented. There are clearly use-cases for disabling both FP and NEON (like the kernel one you gave), though it's semantically a bit unclean if the objective is to avoid writing to I think what this comes down to is:
The answer to 1 is not specified (architecturally or by LLVM) so I think we have to assume that we don't know. However, LLVM might define something in the future, perhaps for some non-obvious use-case. Personally, I like the approach of tying I don't think one should imply the other, though, otherwise disabling just one could be ignored if the other is enabled by default. |
Enforcing the XNOR / biconditional is acceptable. I do have a lingering concern that it is still in fact simpler to not enable the I am more concerned about making sure the language frontend handles this in a "correct" and also somewhat "user-friendly" way, on the thesis that, as the combination is unspecified, it shouldn't be surprising if a codegen backend simply refuses aarch64 with
Well, if +neon bundles +fp, and -neon is encountered without +fp, then I am pretty sure that forces an error, at the latest when the Rust toolchain has to lower a float (either in Rust or LLVM). That said, handling this directly in Rust and enforcing |
For more future-oriented considerations:
My understanding is that SVE2 (the one consumers will start holding in their hands Very Soon) also implies or requires Neon, in the eyes of LLVM? And I am pretty sure that LLVM exposes a "use integer registers only" toggle for the AArch64 target... ah, I would like to expose the ability to simply more finely control architectural state that is used where possible, though I have not worked out yet what that would require. |
It implicitly enables SVE, which in turn enables There's several other features that implicitly enable just I'm fairly inexperienced with using these flags in practice so please correct me if I'm wrong. However, when reasoning about what
The ARM Architecture Reference Manual does describe FP and Neon together. However, while there are instructions appropriate for both, there are also instructions clearly used for FP and others used for SIMD, and LLVM seems to be aware of this (their FP tests don't bother enabling
This looks to be handled just in Clang - it works by passing |
That sounds incorrect, then, actually, if SVE2 enables the FP path only and without Neon/ASIMD support, because SVE/SVE2 is predicated on Neon availability. To arbitrarily exclude the integer register instructions being generated seems incorrect.
LLVM may have such an interpretation, but does it promise to maintain such consistently across versions? Does it agree with GCC on the precise instructions? With Cranelift? Please understand that the RFC is deeply aspirational, and that at the moment, the entirety of all vector architectures in Rust, due to us not having a worked out story for actually handling passing data in registers when the target features may not perfectly align, labors under this penalty: rust/compiler/rustc_middle/src/ty/layout.rs Lines 3194 to 3222 in 03a8cc7
That means that between two functions, if they call each other and are not inlined or otherwise fixed up during mem2reg, even if their target features match precisely, they will still tend to spill the entire vector state to memory. This means vector functions in Rust are always simply slower than their C or C++ equivalents, because even after all the inlining is done, they have to do a ton of cleanup and teardown at the final step. That means not inlining is unusually bad to begin with. And when you have established Which also means that in order to end that problem, we may have to duplicate quite a bit of LLVM's understanding of target features anyways, just to be able to emit correct code for LLVM without breaking, or else the Rust ABI will depend solely on inlining for performance. Even without committing to such duplication, allowing this combination may only be a performance footgun, but it's still a footgun, most people invoking the compiler don't have the faintest idea about these nuances of the Arm architecture to avoid that, even if they compile for AArch64 machines every day. That's not a bug, that's a feature. Ultimately, setting target features, when the Rust toolchain is invoked, is done at the level of Rust. The fact that the target feature resolution code happens to still live entirely in |
I see the intent clearly now - thanks for taking the time to write that, it's very informative. |
Another consideration when tying |
Erm, I don't think any of those actually would be broken. Developers using Rust mostly are not concerned about the FP vs. AdvSIMD split in practice, and functionally do depend on I have also written and maintained many cases of In fact, the I have also had the opportunity to address the questions of and observe the behavior of quite a number of programmers addressing these feature interfaces. The reason I bring up non-intuitive inlining issues is at least partly because they do. The reason I recommend a more terse and simplified interface is because I have seen what they write. People find tinkering with precise target features in this manner exhausting and mostly it is a source of gotchas and frustration, as evidenced in the fact that they have appeared in some of the most popular crates in Rust's SIMD ecosystem. |
I mean that if we tie
This could be fixed by making all features that imply |
☔ The latest upstream changes (presumably #87402) made this pull request unmergeable. Please resolve the merge conflicts. |
...Hm, that seems like an acceptable resolution but does seem to have looped back around to something equivalent to my initial suggestion ("functionally, make +fp and +neon the same idea for aarch64"). |
26a2564
to
01a4471
Compare
In light of #95002 I have rebased this PR and resolved the incongruity of the error messages with |
Summary: Perhaps the best solution is to try this PR, and fall back to #95044 if it breaks things. I've been discussing this with @adamgemmell along the way, and have just re-read this thread, and related PRs, to try to understand how we got here. I maintain that the cleanest design, ignoring legacy, is to map features to hardware ID registers (and OS features), like "paca" and "pacg". However, I think I'd overlooked (or misunderstood the importance of) two significant factors:
In addition, "neon" and "fp" are somewhat special in that they are enabled by default for almost all AArch64 targets. There's an ergonomic problem with simply tying them in that code that implements or uses something called a "NEON intrinsic" also has to enable "fp", which appears superfluous. So, that leaves us with the two proposed solutions we have today:
Re-evaluating all of that, I'm happy to change my position: perhaps the best, most pragmatic solution is to try this PR, and fall back to #95044 in the unlikely event that it breaks some uses of "fp". Were it not for existing uses, I'd propose renaming "neon" to "fpneon", but it's too late for that. To echo what @adamgemmell said a while back: thank you, @workingjubilee, for spending the time to explain your position so clearly! |
⌛ Testing commit 8fa4ae8 with merge bb5f3f680e895438843678a021bab464d74975f9... |
This comment has been minimized.
This comment has been minimized.
💔 Test failed - checks-actions |
28315d7
to
db7e189
Compare
db7e189
to
6c19dc9
Compare
Notes to self:
Apologies for the thrash. @bors r=nagisa,Amanieu |
📌 Commit 6c19dc9 has been approved by |
Rollup of 6 pull requests Successful merges: - rust-lang#91608 (Fold aarch64 feature +fp into +neon) - rust-lang#92955 (add perf side effect docs to `Iterator::cloned()`) - rust-lang#94713 (Add u16::is_utf16_surrogate) - rust-lang#95212 (Replace `this.clone()` with `this.create_snapshot_for_diagnostic()`) - rust-lang#95219 (Modernize `alloc-no-oom-handling` test) - rust-lang#95222 (interpret/validity: improve clarity) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
@adamgemmell I'm not sure, does this need an update to the docs? |
Yes. I will follow up. |
The build is currently failing: <https://github.com/image-rs/jpeg-decoder/runs/5618199478> 1. `aarch64_target_feature` was stabilized in rust-lang/rust#90621 2. `neon`/`fp` must be activated together, which is not yet the case for some intrinsics in `std`. See rust-lang/rust#91608 and rust-lang/rust#95044. Once either of the above solutions lands we can remove `aarch64_target_feature` and unpin nightly again.
The build is currently failing: <https://github.com/image-rs/jpeg-decoder/runs/5618199478> 1. `aarch64_target_feature` was stabilized in rust-lang/rust#90621 2. `neon`/`fp` must be activated together, which is not yet the case for some intrinsics in `std`. See rust-lang/rust#91608 and rust-lang/rust#95044. Once either of the above solutions lands we can remove `aarch64_target_feature` and unpin nightly again.
The build is currently failing: <https://github.com/image-rs/jpeg-decoder/runs/5618199478> 1. `aarch64_target_feature` was stabilized in rust-lang/rust#90621 2. `neon`/`fp` must be activated together, which is not yet the case for some intrinsics in `std`. See rust-lang/rust#91608 and rust-lang/rust#95044. Once either of the above solutions lands we can remove `aarch64_target_feature` and unpin nightly again.
In rust-lang#91608 the fp-armv8 feature was removed as it's tied to the neon feature. However disabling neon didn't actually disable the use of floating point registers and instructions, for this `-fp-armv8` is required.
Fix some issues with folded AArch64 features In rust-lang#91608 the `fp` feature was removed for AArch64 and folded into the `neon` feature, however disabling the `neon` feature doesn't actually disable the `fp` feature. If my understanding on that thread is correct it should do. While doing this, I also noticed that disabling some features would disable features that it shouldn't. For instance enabling `sve` will enable `neon`, however, when disabling `sve` it would then also disable `neon`, I wouldn't expect disabling `sve` to also disable `neon`. cc `@workingjubilee`
The build is currently failing: <https://github.com/image-rs/jpeg-decoder/runs/5618199478> 1. `aarch64_target_feature` was stabilized in rust-lang/rust#90621 2. `neon`/`fp` must be activated together, which is not yet the case for some intrinsics in `std`. See rust-lang/rust#91608 and rust-lang/rust#95044. Once either of the above solutions lands we can remove `aarch64_target_feature` and unpin nightly again.
Arm's FEAT_FP and Feat_AdvSIMD describe the same thing on AArch64:
The Neon unit, which handles both floating point and SIMD instructions.
Moreover, a configuration for AArch64 must include both or neither.
Arm says "entirely proprietary" toolchains may omit floating point:
https://developer.arm.com/documentation/102374/0101/Data-processing---floating-point
In the Programmer's Guide for Armv8-A, Arm says AArch64 can have
both FP and Neon or neither in custom implementations:
https://developer.arm.com/documentation/den0024/a/AArch64-Floating-point-and-NEON
In "Bare metal boot code for Armv8-A", enabling Neon and FP
is just disabling the same trap flag:
https://developer.arm.com/documentation/dai0527/a
In an unlikely future where "Neon and FP" become unrelated,
we can add "[+-]fp" as its own feature flag.
Until then, we can simplify programming with Rust on AArch64 by
folding both into "[+-]neon", which is valid as it supersets both.
"[+-]neon" is retained for niche uses such as firmware, kernels,
"I just hate floats", and so on.
I am... pretty sure no one is relying on this.
An argument could be made that, as we are not an "entirely proprietary" toolchain, we should not support AArch64 without floats at all. I think that's a bit excessive. However, I want to recognize the intent: programming for AArch64 should be simplified where possible. For x86-64, programmers regularly set up illegal feature configurations because it's hard to understand them, see #89586. And per the above notes, plus the discussion in #86941, there should be no real use cases for leaving these features split: the two should in fact always go together.