Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImmutableDictionary has slowly been regressing and in version 5.0 is about 10X slower than Dictionary #47812

Closed
arunchndr opened this issue Feb 3, 2021 · 13 comments

Comments

@arunchndr
Copy link
Member

arunchndr commented Feb 3, 2021

I am on the CPS team in Visual Studio and are benchmarking bottlenecks that affect solution load and a big contributor to the slowdown is Immutable collections. Instead of switching out the collection I started benchmarking across versions of the ImmutableDictionary for example and there is a steady regression going all the way back to version 1.5 of the dictionary

Yes, I understand Immutable collections will be slower, but there are clear indications of a perf bug here, especially when you look at the GetValue numbers.

Some benchmark numbers below:

GetValue:

Method Size Mean
Dictionary 1 6ns
ImmutableDictionary 1 48ns
Dictionary 10 6ns
ImmutableDictionary 10 50ns
Dictionary 100 6ns
ImmutableDictionary 100 58ns
Dictionary 1000 6ns
ImmutableDictionary 1000 58ns
Dictionary 10000 6ns
ImmutableDictionary 10000 77ns

AddRange (SetRange is pretty similar)

Method InitialSize AddSize Mean
       
Dictionary 10 1 9ns
ImmutableDictionary 10 1 570ns
Dictionary 10 10 212ns
ImmutableDictionary 10 10 3559ns
Dictionary 10 100 2412ns
ImmutableDictionary 10 100 41525ns
Dictionary 100 1 9ns
ImmutableDictionary 100 1 1040ns
Dictionary 100 10 920ns
ImmutableDictionary 100 10 5113ns
Dictionary 100 100 1810ns
ImmutableDictionary 100 100 49665ns
Dictionary 1000 1 9ns
ImmutableDictionary 1000 1 1392ns
Dictionary 1000 100 820ns
ImmutableDictionary 1000 100 61750ns

Benchmark repo here - https://devdiv.visualstudio.com/DefaultCollection/Personal/_git/arkalyan

Adding more thoughts from folks who have observed a similar slowness:
The slowness of the immutable.Get actually comes from many layers of method delegations.

In the Dictionary<TKey, TValue>, get_Item is almost a single method (only calls FindEntry), TryGetValue has a similar but separated implementation.

On the other side, in the Immutable version, get_Item calls into TryGetValue, which then delegate to a static version of TryGetValue. The top two layers have very little logic, but they took more than 25% of the CPU time.

Then, the static TryGetValue alone took more than 25% of the CPU time, while the overall time inside two TryGetValue it calls is less than 15%. I think due to the core of this logic is so small, the majority overhead is function calls themselves. Maybe one thing to try is to push more functions inline, but either removing some extra delegating, or ask Compiler/JIT to do more method inlining?

http://index/?leftProject=System.Collections.Immutable&leftSymbol=hghocbfkyomt&file=System%5CCollections%5CImmutable%5CImmutableDictionary_2.cs

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@arunchndr arunchndr changed the title ImmutableDictionary has slowly been regressing and in version 5.0 (10X slower than Dictionary) ImmutableDictionary has slowly been regressing and in version 5.0 is about 10X slower than Dictionary Feb 3, 2021
@joeloff joeloff transferred this issue from dotnet/sdk Feb 3, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Collections untriaged New issue has not been triaged by the area owner labels Feb 3, 2021
@ghost
Copy link

ghost commented Feb 3, 2021

Tagging subscribers to this area: @eiriktsarpalis
See info in area-owners.md if you want to be subscribed.

Issue Details

I am on the CPS team in Visual Studio and are benchmarking bottlenecks that affect solution load and a big contributor to the slowdown is Immutable collections. Instead of switching out the collection I started benchmarking across versions of the ImmutableDictionary for example and there is a steady regression going all the way back to version 1.5 of the dictionary

Yes, I understand Immutable collections will be slower, but there are clear indications of a perf bug here, especially when you look at the GetValue numbers.

Some benchmark numbers below:

GetValue:

Method Size Mean
Dictionary 1 6ns
ImmutableDictionary 1 48ns
Dictionary 10 6ns
ImmutableDictionary 10 50ns
Dictionary 100 6ns
ImmutableDictionary 100 58ns
Dictionary 1000 6ns
ImmutableDictionary 1000 58ns
Dictionary 10000 6ns
ImmutableDictionary 10000 77ns

AddRange (SetRange is pretty similar)

Method InitialSize AddSize Mean
       
Dictionary 10 1 9ns
ImmutableDictionary 10 1 570ns
Dictionary 10 10 212ns
ImmutableDictionary 10 10 3559ns
Dictionary 10 100 2412ns
ImmutableDictionary 10 100 41525ns
Dictionary 100 1 9ns
ImmutableDictionary 100 1 1040ns
Dictionary 100 10 920ns
ImmutableDictionary 100 10 5113ns
Dictionary 100 100 1810ns
ImmutableDictionary 100 100 49665ns
Dictionary 1000 1 9ns
ImmutableDictionary 1000 1 1392ns
Dictionary 1000 100 820ns
ImmutableDictionary 1000 100 61750ns

Benchmark repo here - https://devdiv.visualstudio.com/DefaultCollection/Personal/_git/arkalyan

Author: arkalyanms
Assignees: -
Labels:

area-System.Collections, untriaged

Milestone: -

@stephentoub
Copy link
Member

especially when you look at the GetValue numbers

Dictionary uses a hash map and lookup is O(1). ImmutableDictionary uses a tree and lookup is O(log N).

there is a steady regression going all the way back to version 1.5 of the dictionary

You mean the same operation with the same inputs has been getting slower and slower with each version of the nuget package?

@stephentoub
Copy link
Member

cc: @adamsitnik

@arunchndr
Copy link
Member Author

Dictionary uses a hash map and lookup is O(1). ImmutableDictionary uses a tree and lookup is O(log N)

I understand, but lets compare the Dictionary and the Tree(ImmutableDictionary) states with 1 value loaded in them, so the tree doesn't have to look beyond the root. It still compared as 6ns vs 48ns respectively.

You mean the same operation with the same inputs has been getting slower and slower with each version of the nuget package?

Yes, I picked version 1.5 to run the same benchmarks and values were quite a bit lower. Below is a perf trace comparison of the 2 versions with similar data:

Version 5.0:

Name Exc % Exc Inc % Inc Fold When First Last
||     |         |+ system.collections.immutable!System.Collections.Immutable.ImmutableDictionary`2[System.Int32,System.__Canon].get_Item(!0) 9.8 749 71.6 5,469 0 18888888787878888878877877761_ 260.353 7,053.892
**||     |         ||+ system.collections.immutable!System.Collections.Immutable.ImmutableDictionary`2[System.Int32,System.__Canon].TryGetValue(!0,!1&) 16.3 1,246 61.1 4,668 0 17777677677667667676766666651_ 260.353 7,053.892**
||     |         |||+ system.collections.immutable!System.Collections.Immutable.ImmutableDictionary`2[System.Int32,System.__Canon].TryGetValue(!0,value class MutationInput,!1&) 22.8 1,739 38.7 2,953 0 24444444444444444444444444431_ 263.329 7,053.892
||     |         ||||+ system.collections.immutable!System.Collections.Immutable.SortedInt32KeyNode1[System.Collections.Immutable.ImmutableDictionary2+HashBucket[System.Int32,System.__Canon]].TryGetValue(int32,!0&) 5.9 450 7.4 566 0 00011000100000000000000100100_ 274.230 7,045.933
||     |         |||||+ coreclr!? 1.5 112 1.5 113 0 0000000_00oo00_0o0oo00000o00o_ 325.751 7,015.911
||     |         ||||||+ ntoskrnl!? 0.0 1 0.0 1 0 o_____ 3,192.499 3,193.499
||     |         |||||+ ntoskrnl!? 0.0 2 0.0 3 0 ________________o_____oo 4,883.337 6,700.499
||     |         ||||+ system.collections.immutable!System.Collections.Immutable.ImmutableDictionary`2+HashBucket[System.Int32,System.__Canon].TryGetValue(!0,class Comparers,!1&) 5.9 453 7.0 531 0 00000000000000010000000000000_ 263.329 7,050.929

Version 1.5

Name Exc % Exc Inc % Inc Fold When First Last
||     |     |  + BenchmarkDotNet!Engine.RunIteration 0.0 0 65.2 7,523 0 099999899989999999999980__ 1,337.406 9,566.603
||     |     |   + 709f9c6b-ef0d-4179-9d68-e29fe7348f34!Runnable_1.WorkloadActionUnroll 2.2 251 65.1 7,513 0 099999899989999998999980__ 1,337.406 9,566.603
||     |     |   |+ system.collections.immutable!System.SR..cctor() 0.8 93 58.9 6,801 0 o88888888888888888888870__ 1,337.406 9,566.603
||     |     |   ||+ system.collections.immutable!SecureObjectPool.NewId 8.4 975 58.1 6,708 0 088888788888888887878870__ 1,337.406 9,566.603
**||     |     |   |+ immutabledictionary.benchmark!GetValue.ImmutableDictionary 2.8 326 2.8 326 0 00000000000000000000000o__ 1,351.878 9,533.597**
||     |     |   |+ coreclr!? 0.8 97 0.8 97 0 o00_00o0000000000oo000o___ 1,355.877 9,397.425

@arunchndr
Copy link
Member Author

Admission time: Only caught the fact that 1.5 was faster accidentally because benchmarkdotnet defaulted to it and it did not match the CPS numbers. Then switched the version to 5 that matches internal usage and the above 10x differences are based off of that. :)

@sharwell
Copy link
Member

sharwell commented Feb 3, 2021

It's not going to be possible to have a meaningful improvement to the read access performance of ImmutableDictionary<TKey, TValue> while it's backed by an AVL tree (i.e. there will be a significant performance difference in read-heavy scenarios between Dictionary<TKey, TValue> and ImmutableDictionary<TKey, TValue>). We recently implemented ImmutableSegmentedDictionary<TKey, TValue> with the intent of allowing CPS to migrate away from the AVL-tree based collections. If mutations to the immutable collection are frequent and/or incremental (non-batched), eventually ImmutableTreeDictionary<TKey, TValue> should provide a viable replacement but the current performance in read-heavy scenarios is only marginally better than ImmutableDictionary<TKey, TValue>.

@arunchndr
Copy link
Member Author

Thanks @sharwell It looks like there is some overlap with my FastImmutableDictionary implementation. My goal was to cover (a. Continue using ImmutableDictionary, but try to speed up with bug fixes like this one b. ImmutableDictionary that should not accept writes after it's been built (ImmutableSegmentedDictionary seems to achieve the same goal) c. Replace with a read-only collection instead since immutability is not really leveraged (ImmutableArrayDictionary I believe was your recommendation to target this.). Do you happen to have benchmarks for your collections that I can use to compare against my implementation and choose based on usage?

In the above case, the reason I believe there is still some room for improvement inspite of the AVL tree backing is because the slowness of the immutable.Get seems to come from many layers of method delegations as compared to the Dictionary's FindEntry. In the Immutable version, get_Item calls into TryGetValue, which then delegate to a static version of TryGetValue and the top two layers before the static call have very little logic, but they took more than 25% of the CPU time. Then, the static TryGetValue alone took more than 25% of the CPU time, while the overall time inside the two TryGetValue it calls is less than 15%.

The majority of the overhead is in function calls themselves, perhaps removing extra delagations or asking Compiler/JIT to do more method inlining is a viable alternative and would not involve any changes to the AVL tree itself.

@stephentoub
Copy link
Member

stephentoub commented Feb 4, 2021

@arkalyanms, I took a look at your benchmark, and noted you're running against netcoreapp2.1... that's old news 😄 Can you try something newer? Here's what I see for GetValue / 1.

Method Runtime Size Mean
Dictionary .NET 5.0 1 4.629 ns
ImmutableDictionary .NET 5.0 1 14.435 ns
Dictionary .NET Core 3.1 1 4.743 ns
ImmutableDictionary .NET Core 3.1 1 17.289 ns
Dictionary .NET Core 2.1 1 5.496 ns
ImmutableDictionary .NET Core 2.1 1 41.314 ns

@arunchndr
Copy link
Member Author

Ah nice! So there is a lot of goodness I am missing out on. I’ll update and circle back. Thanks @stephentoub

@arunchndr
Copy link
Member Author

Alright, here are the net5.0 numbers. The delta is not as wide anymore (about 3-4x for Get and AddRange varies drastically based on data). For the cases that has data subject to change, I am inclined to switch over to a more custom ImmutableDictionary implementation I have or the TreeDictionary and to Dictionary if immutability is not required.

I am not sure if we still want to hold onto this bug to investigate some smaller oddities like the GetValue at size 1 (the increase in times with size shows there is some level of tree traversal cost, but there is also an additional overhead cost in ImmutableDictionary, much lower than in netcore2.1 but it's still there)

GetValue:

Method Size Mean
Dictionary 1 5.065 ns
ImmutableDictionary 1 15.744 ns
Dictionary 10 4.333 ns
ImmutableDictionary 10 18.325 ns
Dictionary 100 4.413 ns
ImmutableDictionary 100 20.234 ns
Dictionary 1000 4.637 ns
ImmutableDictionary 1000 20.448 ns

CreateRange (Dictionary with constructor/DictionaryCreateRange and iterate over range and add/DictionaryWithCreateRange)

Method InitialSize AddSize Mean Allocated
DictionaryCreateRange 0 1 65.02 ns 80 B
DictionaryWithCreateRange 0 1 157.04 ns 216 B
**ImmutableDictionary 0 1 167.61 ns 136 B**
DictionaryCreateRange 0 10 38.08 ns 80 B
DictionaryWithCreateRange 0 10 369.04 ns 992 B
**ImmutableDictionary 0 10 1,408.63 ns 712 B**
DictionaryCreateRange 0 100 35.19 ns 80 B
DictionaryWithCreateRange 0 100 2,987.94 ns 10192 B
**ImmutableDictionary 0 100 33,447.37 ns 6472 B**
DictionaryCreateRange 0 1000 43.81 ns 80 B
DictionaryWithCreateRange 0 1000 27,114.84 ns 102216 B
**ImmutableDictionary 0 1000 463,872.26 ns 64072 B**
DictionaryCreateRange 1 1 133.85 ns 248 B
DictionaryWithCreateRange 1 1 154.85 ns 248 B
**ImmutableDictionary 1 1 262.30 ns 200 B**
DictionaryCreateRange 1 10 87.02 ns 248 B
DictionaryWithCreateRange 1 10 320.02 ns 1024 B
**ImmutableDictionary 1 10 1,626.24 ns 776 B**
DictionaryCreateRange 1 100 76.13 ns 248 B
DictionaryWithCreateRange 1 100 2,473.32 ns 10224 B
**ImmutableDictionary 1 100 22,671.73 ns 6536 B**
DictionaryCreateRange 1 1000 75.21 ns 248 B
DictionaryWithCreateRange 1 1000 24,255.32 ns 102248 B
**ImmutableDictionary 1 1000 379,588.42 ns 64136 B**
DictionaryCreateRange 10 1 236.46 ns 472 B
DictionaryWithCreateRange 10 1 281.06 ns 472 B
**ImmutableDictionary 10 1 296.67 ns 328 B**
DictionaryCreateRange 10 10 232.14 ns 472 B
DictionaryWithCreateRange 10 10 502.75 ns 1168 B
**ImmutableDictionary 10 10 2,035.24 ns 904 B**
DictionaryCreateRange 10 100 228.71 ns 472 B
DictionaryWithCreateRange 10 100 2,796.15 ns 12328 B
**ImmutableDictionary 10 100 23,789.28 ns 6664 B**
DictionaryCreateRange 10 1000 228.99 ns 472 B
DictionaryWithCreateRange 10 1000 18,852.06 ns 57904 B
**ImmutableDictionary 10 1000 372,247.07 ns 64264 B**
DictionaryCreateRange 100 1 1,692.04 ns 3160 B
DictionaryWithCreateRange 100 1 2,014.89 ns 3160 B
**ImmutableDictionary 100 1 550.51 ns 584 B**
DictionaryCreateRange 100 10 1,688.81 ns 3160 B
DictionaryWithCreateRange 100 10 2,816.81 ns 9904 B
**ImmutableDictionary 100 10 2,819.03 ns 1160 B**
DictionaryCreateRange 100 100 1,723.77 ns 3160 B
DictionaryWithCreateRange 100 100 3,806.65 ns 9904 B
**ImmutableDictionary 100 100 28,910.46 ns 6920 B**
DictionaryCreateRange 100 1000 1,671.09 ns 3160 B
DictionaryWithCreateRange 100 1000 19,124.78 ns 55480 B
**ImmutableDictionary 100 1000 379,376.33 ns 64520 B**
DictionaryCreateRange 1000 1 15,935.85 ns 31048 B
DictionaryWithCreateRange 1000 1 19,318.62 ns 31048 B
**ImmutableDictionary 1000 1 687.31 ns 776 B**
DictionaryCreateRange 1000 10 15,958.08 ns 31048 B
DictionaryWithCreateRange 1000 10 18,415.86 ns 31048 B
**ImmutableDictionary 1000 10 3,686.45 ns 1352 B**
DictionaryCreateRange 1000 100 15,956.17 ns 31048 B
DictionaryWithCreateRange 1000 100 19,223.19 ns 31048 B
**ImmutableDictionary 1000 100 37,251.51 ns 7112 B**
DictionaryCreateRange 1000 1000 15,991.76 ns 31048 B
DictionaryWithCreateRange 1000 1000 35,218.55 ns 96424 B
**ImmutableDictionary 1000 1000 433,827.04 ns 64712 B**

@eiriktsarpalis
Copy link
Member

Honestly these numbers don't seem particularly surprising to me. There have been proposals like the one in #14477 for replacing the AVL tree implementation with something faster.

@eiriktsarpalis eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Feb 10, 2021
@eiriktsarpalis
Copy link
Member

I'm going to close this issue, since it is not very actionable. I would recommend continuing the conversation in #14477 on potential perf improvements to the internal implementation of immutable hashtables.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants