Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce contention for Datacontract Serialization #70668

Closed

Conversation

Daniel-Svensson
Copy link
Contributor

@Daniel-Svensson Daniel-Svensson commented Jun 13, 2022

Making profiling/stress testing my rewrite of OpenRiaServices from WCF to aspnet core I identified a bottleneck in the datacontract serializer.

I detected some contention for an API returning around 20 000 objects of a fairly simple type with a datacontract surrogate which looks the same.

This fix is made in 2 steps,

  • Improve GetBuiltInDataContract performance
  • Improve GetID performance

Loadtesting was done from local machine so the load testing (Netling/Westwind Web Surge) also consume resources

Initially GetBuiltInDataContract was the source of contention.
Contention time for 20s run:
image

  • The lock + Dictionary was replaced by a ConcurrentDictionary instead removing the contention.

When fixing that most contention seemed to move to GetId:
Contention time for 20s run after first commit:
image

  • The lock + Dictionary was replaced by a ConcurrentDictionary and the lock for the cache is only taken on update

Other possible solutions / comments

  • The first fix maybe setup the whole dictionary on first access and then keeping the dictionary read only ?
    I decided to keep current logic in order to keep the change small and not affect startup/first access performance to much,
  • I looked into caching a static Func for the factory method, but I got problem due to the RequiresUnreferencedCode so i used a static lambda instead (which seems to be cached by roslyn, but with an extra if statement)

Numbers

On 16 AMD 5800X

8 thead loading

net7 preview 4 net7 preview 4 + first commit net7 preview 4 + second commits net7 preview 4 + 3rd commits
req/sec 115 req/sec 235 req/sec 255 req/sec 258 req/sec
data served 5.1gb 10.5gb 11,3gb 11,5gb
95%th 82,7ms 47,1ms 44,7ms 42,43ms
99%th 94,9ms 72ms 63,6ms 51,51ms

16 threads loading

From left to right:

  • net7 preview 4 "original" (with r2r)
  • net7 preview 4 + first commit (~80% cpu load, including load generation)
  • net7 preview 4 + second commits (100% cpu load including load generation)
    image
  • net7 preview 4 + 3rd commit (lazy) (100% cpu load)

* net7 preview 4 + 4th commit (lazy + changed key) *

1 or 2 threads loaded

No significant difference between the versions to draw any conclusions.
update August: performance seems to be better even with no or little concurrency.

  • ~5% for 1 thread (78 -> 82 rps)
  • ~16% for 2 threads (135->157 rps)

Other

  • When only serializing 10 objects per query the improvement drops there is still a measureable improvement ~165 000 rps vs ~<143 000 rps for 16 threads (console logging set to warning)

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jun 13, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added area-Serialization and removed community-contribution Indicates that the PR has been added by a community member labels Jun 13, 2022
@Daniel-Svensson
Copy link
Contributor Author

Daniel-Svensson commented Jun 14, 2022

Commenting reply here as well since it is hidden when resolved.

I did a small benchmarks (single threaded), which just invokes GetId for different keys with almost no adds just to measure overhead for the non-contented case and settled for using concurrent dictionary + lazy.

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100-preview.4.22252.9
  [Host]     : .NET 6.0.5 (6.0.522.21309), X64 RyuJIT
  DefaultJob : .NET 6.0.5 (6.0.522.21309), X64 RyuJIT

Method Mean Error StdDev
Original 2.693 μs 0.0211 μs 0.0197 μs
Id2 (using collectionmarshal) 2.853 μs 0.0160 μs 0.0149 μs
Lazy 3.435 μs 0.0252 μs 0.0236 μs
LazyAndValueKey 2.565 μs 0.0119 μs 0.0111 μs
LazyAndValueKeyWithEqualsComparer 2.558 μs 0.0089 μs 0.0083 μs
Faulty 3.407 μs 0.0238 μs 0.0211 μs
RwlockSlim 5.513 μs 0.0208 μs 0.0194 μs
HashSet 2.784 μs 0.0067 μs 0.0056 μs
ConcurrentDictionary<, int> (commit 7) 1.713 μs 0.0157 μs 0.0147 μs

Update: I switched to use RuntimeTypeHandle as key in order to make the code simpler (since there is no ThreadStatic storage required to cache type handles) which also had the benefit of better performance than original code.
It you think that is not a good solution then feel free to skip the last commit.

In original code GetBuiltInDataContract took 9,8% of time GetId took 0.5% of time.
After commit 3 the numbers where numbers are 0,3% and 0,08%

@StephenMolloy
Copy link
Member

We have a very large PR in the works trying to reconcile the strange port of DCS that has been in .Net Core with the more complete version in .Net 4.8. (#71752). In doing this reconciliation, I believe this issue is also addressed - in a similar but not exactly the same manner. After that PR goes through though, you should see performance improvement here. If not, please reopen the issue.

@Daniel-Svensson
Copy link
Contributor Author

@StephenMolloy do you have time frame for your large pr, Is there any chance of this getting reviewed in time to make it possible for a merge before the net7 cut-off date at August 15 ?

I did a quick test if your pr and it only improves performance marginally so I think it still makes sense to improve the 2 methods that are changed here.
FB924B45071F42D59E3C6E2B51073905

@StephenMolloy
Copy link
Member

As mentioned earlier, a similar approach on this particular issue has been brought back from .Net 4.8 in our mega reconciliation of 7.0 with 4.8. #71752 has been checked in now. Hopefully you see this show up in your performance testing of RC1.

@ghost ghost locked as resolved and limited conversation to collaborators Sep 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants