-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve AvaloniaObject.GetValue performance #15342
Conversation
You can test this PR using the following package version. |
public bool Equals(Type? x, Type? y) | ||
=> x == y; | ||
|
||
public int GetHashCode(Type obj) | ||
=> RuntimeHelpers.GetHashCode(obj); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if using RuntimeTypeHandle instead would make any reasonable difference without going way too deep into microoptimizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very interesting.
This is the dictionary benchmark, where we can see that TypeComparer
is faster than the default comparer (.NET 8, x64, Windows).
(typeof(Control)
is inside the dictionary, typeof(Button)
isn't).
Method | Type | Mean | Error | StdDev | Ratio |
---|---|---|---|---|---|
DefaultComparer | Button | 5.976 ns | 0.0498 ns | 0.0466 ns | 1.00 |
TypeComparer | Button | 4.930 ns | 0.0339 ns | 0.0317 ns | 0.82 |
RuntimeTypeHandleComparer | Button | 5.737 ns | 0.0381 ns | 0.0338 ns | 0.96 |
ReferenceComparer | Button | 4.940 ns | 0.0435 ns | 0.0407 ns | 0.83 |
DefaultComparer | Control | 6.735 ns | 0.0490 ns | 0.0458 ns | 1.00 |
TypeComparer | Control | 6.132 ns | 0.0444 ns | 0.0416 ns | 0.91 |
RuntimeTypeHandleComparer | Control | 6.445 ns | 0.0508 ns | 0.0475 ns | 0.96 |
ReferenceComparer | Control | 5.310 ns | 0.0300 ns | 0.0281 ns | 0.79 |
Benchmark code
public class BenchsDictionaryType
{
private readonly Dictionary<Type, object> _dic1 = CreateDictionary(null);
private readonly Dictionary<Type, object> _dic2 = CreateDictionary(TypeEqualityComparer.Instance);
private readonly Dictionary<Type, object> _dic3 = CreateDictionary(RuntimeTypeHandleEqualityComparer.Instance);
private readonly Dictionary<Type, object> _dic4 = CreateDictionary(ReferenceEqualityComparer.Instance);
private static Dictionary<Type, object> CreateDictionary(IEqualityComparer<Type>? comparer)
=> new(comparer)
{
[typeof(int)] = "abc",
[typeof(string)] = "def",
[typeof(Control)] = "ghi"
};
[Params(typeof(Button), typeof(Control))]
public Type Type { get; set; }
[Benchmark(Baseline = true)]
public bool DefaultComparer()
=> _dic1.TryGetValue(Type, out _);
[Benchmark]
public bool TypeComparer()
=> _dic2.TryGetValue(Type, out _);
[Benchmark]
public bool RuntimeTypeHandleComparer()
=> _dic3.TryGetValue(Type, out _);
[Benchmark]
public bool ReferenceComparer()
=> _dic4.TryGetValue(Type, out _);
}
public sealed class TypeEqualityComparer : IEqualityComparer<Type>
{
public static TypeEqualityComparer Instance { get; } = new();
public bool Equals(Type? x, Type? y)
=> x == y;
public int GetHashCode(Type obj)
=> RuntimeHelpers.GetHashCode(obj);
}
public sealed class RuntimeTypeHandleEqualityComparer : IEqualityComparer<Type>
{
public static RuntimeTypeHandleEqualityComparer Instance { get; } = new();
public bool Equals(Type? x, Type? y)
=> x.TypeHandle.Equals(y.TypeHandle);
public int GetHashCode(Type obj)
=> obj.TypeHandle.GetHashCode();
}
Since TypeComparer
was faster in the dictionary benchmark, I actually read the results of the Equals
benchmark backwards!
You can see that ==
is in fact slower than Equals
, despire the dictionary lookup being faster.
Equals:
Method | Type1 | Type2 | Mean | Error | StdDev | Ratio |
---|---|---|---|---|---|---|
Equals | Button | Button | 0.5504 ns | 0.0102 ns | 0.0095 ns | 1.00 |
Operator | Button | Button | 1.0966 ns | 0.0160 ns | 0.0149 ns | 1.99 |
TypeHandleEquals | Button | Button | 0.4741 ns | 0.0026 ns | 0.0020 ns | 0.86 |
ReferenceEquals | Button | Button | 0.1434 ns | 0.0023 ns | 0.0022 ns | 0.26 |
Equals | Button | StyledElement | 0.5405 ns | 0.0108 ns | 0.0101 ns | 1.00 |
Operator | Button | StyledElement | 1.1051 ns | 0.0170 ns | 0.0159 ns | 2.05 |
TypeHandleEquals | Button | StyledElement | 0.4873 ns | 0.0033 ns | 0.0026 ns | 0.91 |
ReferenceEquals | Button | StyledElement | 0.1469 ns | 0.0062 ns | 0.0051 ns | 0.27 |
Equals | StyledElement | Button | 0.5308 ns | 0.0122 ns | 0.0114 ns | 1.00 |
Operator | StyledElement | Button | 1.1185 ns | 0.0201 ns | 0.0188 ns | 2.11 |
TypeHandleEquals | StyledElement | Button | 0.4847 ns | 0.0101 ns | 0.0095 ns | 0.91 |
ReferenceEquals | StyledElement | Button | 0.1475 ns | 0.0037 ns | 0.0033 ns | 0.28 |
Equals | StyledElement | StyledElement | 0.5278 ns | 0.0093 ns | 0.0087 ns | 1.00 |
Operator | StyledElement | StyledElement | 1.1040 ns | 0.0175 ns | 0.0164 ns | 2.09 |
TypeHandleEquals | StyledElement | StyledElement | 0.5013 ns | 0.0118 ns | 0.0110 ns | 0.95 |
ReferenceEquals | StyledElement | StyledElement | 0.1546 ns | 0.0100 ns | 0.0094 ns | 0.29 |
The dictionary difference is probably due to the different GetHashCode
implementation:
Method | Type | Mean | Error | StdDev | Ratio |
---|---|---|---|---|---|
Normal | Button | 0.6236 ns | 0.0108 ns | 0.0101 ns | 1.00 |
RuntimeHelper | Button | 0.4295 ns | 0.0085 ns | 0.0079 ns | 0.69 |
Benchmark code
public class BenchsTypeEquals
{
[Params(typeof(StyledElement), typeof(Button))]
public Type Type1 { get; set; }
[Params(typeof(StyledElement), typeof(Button))]
public Type Type2 { get; set; }
[Benchmark(Baseline = true)]
public bool Equals()
=> Type1.Equals(Type2);
[Benchmark]
public bool Operator()
=> Type1 == Type2;
[Benchmark]
public bool TypeHandleEquals()
=> Type1.TypeHandle.Equals(Type2.TypeHandle);
[Benchmark]
public bool ReferenceEquals()
=> ReferenceEquals(Type1, Type2);
}
public class BenchsTypeGetHashCode
{
[Params(typeof(Button))]
public Type Type { get; set; }
[Benchmark(Baseline = true)]
public int Normal()
=> Type.GetHashCode();
[Benchmark]
public int RuntimeHelper()
=> RuntimeHelpers.GetHashCode(Type);
}
In every case, you can see that while RuntimeTypeHandle
is a bit faster than the the default one, ReferenceEqualityComparer
always wins, as we can't get much simpler than a simple comparison.
In the current TypeEqualityComparer
implementation, I realize there's no point in not using ReferenceEquals()
when GetHashCode()
already uses the reference hashcode through RuntimeHelpers
. Doing so would make it a ReferenceEqualityComparer
. This doesn't handle non-RuntimeType
types, but we don't really support that scenario.
I'll delete TypeEqualityComparer
and use ReferenceEqualityComparer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a slight speedup improvement in Remeasure
with the ReferenceEqualityComparer
, I've updated the OP with the numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks great to me.
But need @grokys opinion on possible behavioral changes.
You can test this PR using the following package version. |
62c21a1
to
db191d2
Compare
You can test this PR using the following package version. |
You can test this PR using the following package version. |
What does the pull request do?
This PR improves the performance of
AvaloniaProperty.GetValue()
, which is used for every Avalonia property access.This method is such a hot path that even tiny improvements can have a measurable effect.
Numbers
Let's start with the
Remeasure
benchmark, which does a re-layout pass for ≈6500 nestedStackPanel
/Buttons
:Before
After
(Numbers are for Ryzen 9 5900X, Windows 11, x64)
Now for the micro-benchmarks:
Before
After
How was the solution implemented (if it's not obvious)?
There are two main gains here:
Default values
First, improving getting the default value was the main concern since that's the most common case. (Look at that almost 9x speedup!)
This is done by avoiding metadata lookup as much as possible. If we have only a single default value even for a property with multiple metadatas (
AddOwner()
calls), cache it in a field and return it asap.If we still have several metadatas, lookup fast if possible.
Type.IsInstanceOfType(obj)
is about 25% faster for classes thanType.IsAssignableFrom(obj.GetType())
(.NET 8). A dictionary is used only in last resort.A
ReferenceEqualityComparer
is used for this dictionary, resulting in a ≈20% speedup per lookup.[MethodImpl(MethodImplOptions.AggressiveInlining)]
has been used where it made sense to do so (always measured first).Local values
Next was improving lookup for local values.
This is where micro-optimizations matter.
AvaloniaPropertyDictionary.TryGetValue()
is everywhere: every nanosecond counts here. Even if there's no local value for a property, we must always get through this path first. It has to be fast.The binary search algorithm used here went through several iterations. I tried several versions of loop unrolling, unsafe code and SIMD, verifying performance numbers and checking the resulting x64 assembly code every time, ensuring there's no unneeded instruction. (The results may vary for other platforms, but I still expect a gain.)
It turns out that a very simple loop is still the fastest (see the
LookupProperties
benchmark above), with some extras such as a forced bounds check removal (that couldn't be elided naturally by the JIT). It's fast enough that the linear search part has been removed.The binary search in
FindEntry
has been copied manually inTryGetValue
, since even aggressive inlining wasn't enough to squeeze that last drop of performance out of the method.The property's
Id
is stored directly inside theEntry
, avoiding an indirection and probably allowing everything to fit inside a CPU cache line (the change was measurable in theLookupProperties
benchmark).Misc
Used
ArgumentNullException.ThrowIfNull
where possible, ensuring thethrow
part is never inlined, allowing the JIT to inline the caller if it chooses to do so.Made
Optional<T>.GetValueOrDefault()
an unconditonal single field access, since_value
will bedefault
if there's no value (Nullable<T>
has a similar implementation).