-
-
Notifications
You must be signed in to change notification settings - Fork 702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MessagePack.Experimental package which includes SIMD(Single Instruction Multiple Data) accelerated primitive array formatters. #988
Conversation
Thanks for contributing, @pCYSl5EDgo. But I fail to see how this PR will improve performance at all. It just adds |
HARDWARE_INTRINSICS_X86
preprocessor symbol to netcoreapp3.1 targeted build
I will make some PR based on this PR. |
Ok, I appreciate you wanting focused PRs, but I want at least some value in each PR. All this one does is add build time. 🙂 |
Thank you for your reminding me of the tests running on netcoreapp3.1! Ok, I'll make this PR involving one small improvement. |
Need more tuning.
Interim Report I tried to improve the performance of MessagePackWriter.Write(string) by examining that each char value of input string is in ASCII range. I'll try another type. |
Thanks for measuring impact rather than just assuming the change makes things faster! |
Interim Report I add API The table below compares the elapsed time between new Hardware Intrinsic code and MessagePack-CSharp v2.1.152.
|
Interim Report I removed api I improved the performance of the serialization of
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing. I imagine this took quite a bit of effort. I nevertheless have some concerns that I'm interested in how you'll respond to.
...ssagePack.UnityClient/Assets/Scripts/MessagePack/Formatters/StandardClassLibraryFormatter.cs
Outdated
Show resolved
Hide resolved
...ssagePack.UnityClient/Assets/Scripts/MessagePack/Formatters/StandardClassLibraryFormatter.cs
Outdated
Show resolved
Hide resolved
0, 128, 1, 128, 2, 128, 3, 128, 4, 128, 5, 128, 6, 128, 7, 128, | ||
128, 0, 128, 1, 128, 2, 128, 3, 128, 4, 128, 5, 128, 6, 128, 7, | ||
}; | ||
fixed (byte* pShuffle = shuffle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very impressed by what you wrote here. But I'm also very concerned about maintainability. I have no idea what any of this does and have never seen code like it before. The array above is a total mystery to me as to where it came from, and why "shuffling" is required baffles me. If we keep this, IMO we would need a lot of code comments and links to docs or blogs that explain it.
Why in the world do we need to do all this when we could do a straight up memcpy of sbyte[]
to the memory returned from GetSpan
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why in the world do we need to do all this when we could do a straight up memcpy of sbyte[] to the memory returned from GetSpan?
It's incorrect.
According to current MessagePackWriter.Write(sbyte), Some sbyte values ranges from -33 to -128 have to be encoded and must not be just copied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what any of this does and have never seen code like it before.
It is known among the C/C++ programmers who use SIMD hardware intrinsics.
This is a good time for C#ers to learn and use SIMD programming.
The array above is a total mystery to me as to where it came from, and why "shuffling" is required baffles me.
I will explain where this shuffle
table came from in the T4 file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I look forward to the .tt file update.
It's incorrect.
According to current MessagePackWriter.Write(sbyte), Some sbyte values ranges from -33 to -128 have to be encoded and must not be just copied.
Huh. Good point. I guess when we write out byte[]
we skip the attempted compression of each byte because we precede it with a special msgpack 'binary' header, which we don't do for sbyte[]
, so I guess we're on the hook to properly encode each one. But that's... really unfortunate and I would encourage folks who want good perf to simply cast their sbyte[]
as a byte[]
(using pointers) for faster and more compact encoding. But I guess I get what you're going for. Thanks for explaining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simply cast their sbyte[] as a byte[] (using pointers) for faster and more compact encoding.
I did the same thing for personal usage...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have written comments in the T4 file.
The following list is for reference
References
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sbyte[]
formatter is optimized differently than the other types...
The below table is measured by the Int8ArrayBenchmarkMessagePackNoSimdVsMessagePackSimd
benchmark.
Method | Mean | Error | StdDev |
---|---|---|---|
SerializeSimd | 19.59 ms | 0.345 ms | 0.472 ms |
SerializeNoSimd | 116.50 ms | 1.306 ms | 1.221 ms |
SerializeSimdZero | 10.05 ms | 0.152 ms | 0.127 ms |
SerializeNoSimdZero | 56.92 ms | 0.635 ms | 0.594 ms |
SerializeSimdM32 | 10.11 ms | 0.092 ms | 0.086 ms |
SerializeNoSimdM32 | 59.48 ms | 1.171 ms | 2.112 ms |
SerializeSimdM33 | 28.11 ms | 0.375 ms | 0.333 ms |
SerializeNoSimdM33 | 91.95 ms | 0.790 ms | 0.701 ms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's... really unfortunate and I would encourage folks who want good perf to simply cast their sbyte[] as a byte[] (using pointers) for faster and more compact encoding.
I change the formatter to cast sbyte[]
as byte[]
and encode it.
This is the benchmark result.
Method | Mean | Error | StdDev |
---|---|---|---|
SerializeSimd_ConvertByteArray | 11.05 ms | 0.206 ms | 0.426 ms |
SerializeNoSimd | 127.54 ms | 2.472 ms | 2.064 ms |
SerializeSimdZero_ConvertByteArray | 11.32 ms | 0.222 ms | 0.247 ms |
SerializeNoSimdZero | 64.56 ms | 1.243 ms | 2.830 ms |
SerializeSimdM32_ConvertByteArray | 10.89 ms | 0.183 ms | 0.290 ms |
SerializeNoSimdM32 | 63.10 ms | 1.178 ms | 1.157 ms |
SerializeSimdM33_ConvertByteArray | 10.77 ms | 0.150 ms | 0.133 ms |
SerializeNoSimdM33 | 97.38 ms | 0.998 ms | 0.885 ms |
My PR is not that bad compared to that which converts to byte[].
@@ -35,6 +40,581 @@ public byte[] Deserialize(ref MessagePackReader reader, MessagePackSerializerOpt | |||
} | |||
} | |||
|
|||
public sealed class SByteArrayFormatter : IMessagePackFormatter<sbyte[]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why use sbyte
here? Who uses sbyte[]
? If there's a perf improvement to be made in writing out an sbyte[]
array, wouldn't that also apply to byte[]
, which is far more popular?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specialized Formatter has already existed for byte[]
. I think that formatter is fastest. I don't need to improve that.
Implemented : sbyte[]
, short[]
Work In Progress : int[]
, ushort[]
, uint[]
The reason of sbyte[]
is that its implementation difficulty seemed easy to me compared to others.
I get used to implementing improved formatters.
Yes, this is a practice.
I will write int[]
formatter which seems more difficult to implement than sbyte[]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, before you go to any more work, I'd like to feel settled on what you've done so far so you don't waste effort if we're not going to take the PR ultimately anyway. I'm not saying we won't... I'm just saying that since you submitted some, let us review this and understand it enough to justify your continued effort here.
src/MessagePack.UnityClient/Assets/Scripts/MessagePack/MessagePackWriter.cs
Outdated
Show resolved
Hide resolved
src/MessagePack.UnityClient/Assets/Scripts/MessagePack/MessagePackWriter.cs
Outdated
Show resolved
Hide resolved
Add double[] formatter. BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1049 (1909/November2018Update/19H2)
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.401
[Host] : .NET Core 3.1.7 (CoreCLR 4.700.20.36602, CoreFX 4.700.20.37001), X64 RyuJIT
LongRun : .NET Core 3.1.7 (CoreCLR 4.700.20.36602, CoreFX 4.700.20.37001), X64 RyuJIT
Job=LongRun IterationCount=100 LaunchCount=3
WarmupCount=15
|
I'm re-reviewing, but so far almost all the diffs I see in the existing projects are style changes. Please revert everything that's unrelated to your perf work. You can submit another PR with the style changes if you feel so inclined and we can weigh those separately. |
Never mind, I'll take care of it while I'm reviewing. |
Also: * deleted `IntegerArrayFormatterHelper.cs` which the PR had added but seems to not use. * replaced MessagePack_2_1_165.dll with the one from the nuget package by that version. The one placed here previously was slightly different and I don't know why, but using the official build seems prudent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks for contributing!
|
||
using MessagePack.Formatters; | ||
|
||
namespace MessagePack.Experimental.Resolvers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use the actual namespace and type names that we would in the primary library, we can eventually move these types from Experimental to the main library without a binary breaking change in the future. Any concerns with dropping Experimental
from the namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have any concerns.
I had forgotten the perspective "binary breaking change in the future".
Thank you for your dropping Experimental
!
But IMO we should target 2.2 with this change, which is only days away from being released anyway. |
Thank you for your review, revert and approval!
I'm worried the benchmark project will not work. The differences between "MessagePack_1_7_3_6.dll" and Nuget Official one
C#'s extern alias official explanation saids that |
CI seems to fail in the step of I do not know how to fix the CI failure. :( |
I'll take a look at the CI break. Regarding assembly names, it should not be necessary to change the assembly name if the assembly version is already unique, but I will confirm this. |
The CI break resolved itself. I suspect you had retargeted to the Regarding the assembly name, yes the assembly names must be unique, and yours wasn't because you were building in a v2.1 branch and had checked in messagepack with an assembly version of 2.1.0.0, which matched what was built in that branch. So once your change merges with the v2.2 branch and the assembly version changes to 2.2.0.0, it works. So I'm going to push a change to your PR that merges with v2.2 then reverts the custom build of messagepack.dll so it's just the standard one since at that point we won't need the assembly name change. |
@neuecc I'm satisfied with this. Are you? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good.
I appreciate your fix.
Oh, I'll study it again. Thank you for your explanatation. |
@pCYSl5EDgo do you mind if I squash your PR instead of merge it, given it has 47 commits? |
@AArnott |
HARDWARE_INTRINSICS_X86
preprocessor symbol to netcoreapp3.1 targeted build
Thanks for your contribution, @pCYSl5EDgo ! |
The latest LTS version of .NET Core is 3.1.
.NET Core 3.1 provides Hardware Intrinsics.
The SIMD Intrinsics would make some formatters more faster.
Accelerating UTF-8 Decoding Using SIMD Instructions
Above article shows important implications for how to efficiently encode fixed-length elements to variable length.