-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mono] Investigate rewriting Scalar<T> methods so the linker can replace them with stubs #71430
Comments
The pattern that I think it would likely be better, long term, for Mono to adjust itself to better handle the pattern. |
hi @tannergooding, the idea was to create a tracking issue for investigating whether it is possible to rewrite the On the other hand, yes, you are right, there is a more concrete solution tracked here: #71431 while this one is a faster approach to see the potential gains. |
I think the confusing part is simply that it says
We could make it smaller for platforms with SIMD acceleration by making them recursive on one path (much like the But otherwise, I don't expect there is a simplification, especially not one that is expressible in C#. I don't think representing the fallback path in the JIT is a good idea as it defeats the purpose of having a fallback and requires "more bringup" work for new or community supported platforms/architectures. |
Where is the opportunity to rewrite such methods with generic math? Such pattern is often used for high performance code. |
Generic Math isn't possible for this scenario right now because there is no way to go from something "less constrained" to something "more constrained" That is, given Supporting this scenario in the future has been raised and will likely be taken into consideration |
The problem with this pattern is that it causes us to generate a ton of fallback code for cases that can't happen. For any concrete instantiation with a supported type the code is fine: the typeswitch compiles down to a single branch and the overall function is usually small enough to get inlined into the caller, etc. But we also have to generate a universal generic fallback. This thing by definition doesn't know anything about the generic type so it has to generate a huge type switch and use the slower gsharedvt (universal generics / boxing / dictionary passing) approach. There are principled things we can do (especially if aided by hints to the AOT compiler about which specializations are profitable to generate), but getting there will take some engineering effort. In a closed universe like an iOS application, rewriting with the IL trimmer may give us an indication of the potential space savings if nothing else. |
Sure, but my point is that this is a very common pattern in perf oriented code. Within corelib its used by vector code, its used by generic math, its also used in various other hot paths to ensure specialization occurs. Outside the ecosystem its used by all sorts of libraries and applications. So this is something that will ultimately have to be handled or it will continue presenting as a negative experience for Mono. Not just here, but throughout the entire ecosystem.
It seems like, at the very least you could recognize the It would then seem reasonable to assume that with such a pattern, the specializations to consider are exactly the In the case of things like |
It is typically impossible to tell whether these specializations are unused thanks to reflection. |
We have attributes and other annotations designed to assist with that. I don't think its unreasonable to require or expect those to be used here as well. These types of patterns aren't going to go away. In the worst case the AOT codegen for this pattern could be There are always exceptions, yes, but those tend to be rarer especially for this specific pattern. |
The existing trimming annotations do not track exact generic instantiations. They only allow you to say that
The problems that these patterns are trying to solve are not going away. I have doubts that the current popular patterns are the right way to solve these problems. These patterns are bad for startup, AOT or trimming. They are good only if your only performance metric is steady state RPS. |
I get the feeling this is the old problem of how to tell AOT compiler that the method will never be called dynamically (e.g. reflection) again. |
That sounds like a missing hole we should address.
I'm not aware of any alternatives that provide equivalent, especially where we have back-compat requirements that prevent versioning. For example If we have alternatives, then I'd be happy to move to them.
I think that is dependent on whether reflection is used or not. At least personally, I've not seen any issues with generics and reflection free code and I've made quite heavy usage of it + NAOT and friends in various personal projects. |
Just wanted to add what trimming can do (and should work already): Figuring out that we don't need the fallback paths is much harder, but potentially doable, IF we assume the app will not produce any trim warnings. Given that currently most our verticals produce warnings, it would make such optimization rather risky. |
It's not necessary for correctness, but it would come handy for size reduction in this case. So far we've been focusing mostly on correctness - "if it doesn't work, it really doesn't matter if it's small or not". It basically boils down to usage of |
Even harder problem is trimming of interface implementations that escape via casting. Once we see that an interface is used, we keep it on all types that implement that interface and expand it on all instantiations of that type if it is generic. It has been a classic problem with trimming of
I do not think we have any good alternatives today. One can avoid these problems by avoiding generics and interface abstractions, but that's not really viable alternative in many cases. Creating better ways to solve these problems would require creating new fundamental language/runtime features. |
ResultWith a toy implementation of custom linker substitutions for replacing AssumptionThe assumption for this attempt was that when targeting ProblemsProblems with this assumption, apart from the indirect calls to ConclusionThe size reduction of From these |
Moving to 8.0.0 milestone. This will be revisited for .NET8 planning |
As we are progressing with Mono intrinsics, it is good to review this issue and try to conclude if the Scalar fallback code can be trimmed when SIMD is enabled. Initial idea was to add a substitution to the ILLinker for a fallback code. However, as @fanyang-mono mentioned, with intrinsics implemented in Vector classes, Scalar class will not be referenced, and thus the substitution might not be needed. Another topic is reflection.
Concerns regarding
Let's continue the discussion and see if there is a case where size savings can be achieved. cc @fanyang-mono @jandupej @ivanpovazan @LeVladIonescu @vargaz @lambdageek @SamMonoRT |
Some of this was refactored. Generally speaking, we now have the following for the software fallbacks:
In all cases, we have a restriction that
For RyuJIT:
Given the refactorings and the intrinsic recognition, I would expect we are getting the Someone else can provide the support details for Mono/WASM, which are being brought online but which aren't as feature complete. Once the work to add the intrinsic support to Mono achieves "parity" with RyuJIT, then I'd expect this to be a non-issue for the same scenarios as RyuJIT. That would not handle the case where no hardware acceleration exists for a given platform. I still believe the best case scenario for that is to ensure the AOT tooling understands the |
@tannergooding Thank you for such thorough analysis. As you proposed, we are already working on optimizing generics to allow specialization and exclude instances that are not used. @fanyang-mono According to the analysis, on |
@kotlarmilos My understanding is that when SIMD support is fully there, there is nothing needed to be trimmed. When SIMD support is incomplete, which is Mono's current situation, we need them to produce the correct code. The same logic applies to both arm64 and x64. Another thing that @tannergooding mentioned which we should look into is to make Mono AOT produce |
How is delegates/reflection going to be handled without retaining the fallback code ? |
As per the above, these types were rewritten to:
This means that on Arm64 with the full JIT support, For x64, this impacts both For That being said, the common case will be |
Make sense. I see it from the implementation perspective as if (IsHardwareAccelerated) {
SIMDImplementation();
} else {
softwareCallback();
} Is it trimmed by a substitution in ILLinker?
I will open an issue and add it to #80938. |
@radekdoulik should we move this to .NET9 ? |
Scalar is defined with
struct
constraint, but additionally in many places (in different methods) there are runtime checks to check for a specific type support. If the type is not supported the method just throws.AOTing such pattern with Mono results in unnecessary large and slow methods.
Rewriting methods of Scalar in a way that linker can replace them with stubs can potentially reduce the size of the AOTed code by Mono.
This is related to #56385
cc @vargaz
The text was updated successfully, but these errors were encountered: