Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MessagePack.Experimental package which includes SIMD(Single Instruction Multiple Data) accelerated primitive array formatters. #988

Merged
merged 47 commits into from
Sep 14, 2020

Conversation

pCYSl5EDgo
Copy link
Contributor

The latest LTS version of .NET Core is 3.1.

.NET Core 3.1 provides Hardware Intrinsics.
The SIMD Intrinsics would make some formatters more faster.

Accelerating UTF-8 Decoding Using SIMD Instructions
Above article shows important implications for how to efficiently encode fixed-length elements to variable length.

@AArnott
Copy link
Collaborator

AArnott commented Jul 27, 2020

Thanks for contributing, @pCYSl5EDgo. But I fail to see how this PR will improve performance at all. It just adds HARDWARE_INTRINSICS_X86 as a preprocessor symbol but we don't have any source code that checks for this at the moment.
If you want to add such code with this PR, that's fine. But I don't think it makes sense as-is.

@AArnott AArnott changed the title Add Support for .NET Core 3.1 Add HARDWARE_INTRINSICS_X86 preprocessor symbol to netcoreapp3.1 targeted build Jul 27, 2020
@pCYSl5EDgo
Copy link
Contributor Author

I will make some PR based on this PR.
This is the first and basic PR.
I add the symbol just for preparation.

@AArnott
Copy link
Collaborator

AArnott commented Jul 28, 2020

Ok, I appreciate you wanting focused PRs, but I want at least some value in each PR. All this one does is add build time. 🙂
So please add the value to this one.
Also we'll need to make sure tests run on netcoreapp2.1 as well as netcoreapp3.1 to test all code paths.
Also if we really are adding x86-only paths to the code, we'll need to add automated testing on a non-x86 platform to make sure we don't regress those.

@pCYSl5EDgo
Copy link
Contributor Author

Thank you for your reminding me of the tests running on netcoreapp3.1!
I forgot it.

Ok, I'll make this PR involving one small improvement.
Thank you!

@pCYSl5EDgo
Copy link
Contributor Author

Interim Report

I tried to improve the performance of MessagePackWriter.Write(string) by examining that each char value of input string is in ASCII range.
It slows down because of iterating twice of input chars.

I'll try another type.

@AArnott
Copy link
Collaborator

AArnott commented Jul 31, 2020

Thanks for measuring impact rather than just assuming the change makes things faster!

@pCYSl5EDgo
Copy link
Contributor Author

Interim Report

I add API void MessagePackWriter.Write(sbyte[]) and void MessagePackWriter.Write(ReadOnlySpan<sbyte>).

The table below compares the elapsed time between new Hardware Intrinsic code and MessagePack-CSharp v2.1.152.
In the test code, these 2 competitors encoded a 16MByte random sbyte array.

Method Mean Error StdDev
SerializeSimd 41.64 ms 0.810 ms 1.188 ms
SerializeNoSimd 119.85 ms 2.391 ms 2.559 ms

@pCYSl5EDgo
Copy link
Contributor Author

Interim Report

I removed api void MessagePackWriter.Write(sbyte[]).
I changed the implementation of StandardClassFormatters.

I improved the performance of the serialization of short[].

Method Mean Error StdDev
SerializeSimd 74.14 ms 0.467 ms 0.414 ms
SerializeNoSimd 243.51 ms 1.664 ms 1.475 ms
SerializeSimdZero 42.07 ms 0.135 ms 0.119 ms
SerializeNoSimdZero 80.48 ms 1.264 ms 1.183 ms

Copy link
Collaborator

@AArnott AArnott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing. I imagine this took quite a bit of effort. I nevertheless have some concerns that I'm interested in how you'll respond to.

0, 128, 1, 128, 2, 128, 3, 128, 4, 128, 5, 128, 6, 128, 7, 128,
128, 0, 128, 1, 128, 2, 128, 3, 128, 4, 128, 5, 128, 6, 128, 7,
};
fixed (byte* pShuffle = shuffle)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very impressed by what you wrote here. But I'm also very concerned about maintainability. I have no idea what any of this does and have never seen code like it before. The array above is a total mystery to me as to where it came from, and why "shuffling" is required baffles me. If we keep this, IMO we would need a lot of code comments and links to docs or blogs that explain it.

Why in the world do we need to do all this when we could do a straight up memcpy of sbyte[] to the memory returned from GetSpan?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why in the world do we need to do all this when we could do a straight up memcpy of sbyte[] to the memory returned from GetSpan?

It's incorrect.
According to current MessagePackWriter.Write(sbyte), Some sbyte values ranges from -33 to -128 have to be encoded and must not be just copied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what any of this does and have never seen code like it before.

It is known among the C/C++ programmers who use SIMD hardware intrinsics.
This is a good time for C#ers to learn and use SIMD programming.

The array above is a total mystery to me as to where it came from, and why "shuffling" is required baffles me.

I will explain where this shuffle table came from in the T4 file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I look forward to the .tt file update.

It's incorrect.
According to current MessagePackWriter.Write(sbyte), Some sbyte values ranges from -33 to -128 have to be encoded and must not be just copied.

Huh. Good point. I guess when we write out byte[] we skip the attempted compression of each byte because we precede it with a special msgpack 'binary' header, which we don't do for sbyte[], so I guess we're on the hook to properly encode each one. But that's... really unfortunate and I would encourage folks who want good perf to simply cast their sbyte[] as a byte[] (using pointers) for faster and more compact encoding. But I guess I get what you're going for. Thanks for explaining.

Copy link
Contributor Author

@pCYSl5EDgo pCYSl5EDgo Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simply cast their sbyte[] as a byte[] (using pointers) for faster and more compact encoding.

I did the same thing for personal usage...

Copy link
Contributor Author

@pCYSl5EDgo pCYSl5EDgo Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have written comments in the T4 file.
The following list is for reference

References

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sbyte[] formatter is optimized differently than the other types...

The below table is measured by the Int8ArrayBenchmarkMessagePackNoSimdVsMessagePackSimd benchmark.

Method Mean Error StdDev
SerializeSimd 19.59 ms 0.345 ms 0.472 ms
SerializeNoSimd 116.50 ms 1.306 ms 1.221 ms
SerializeSimdZero 10.05 ms 0.152 ms 0.127 ms
SerializeNoSimdZero 56.92 ms 0.635 ms 0.594 ms
SerializeSimdM32 10.11 ms 0.092 ms 0.086 ms
SerializeNoSimdM32 59.48 ms 1.171 ms 2.112 ms
SerializeSimdM33 28.11 ms 0.375 ms 0.333 ms
SerializeNoSimdM33 91.95 ms 0.790 ms 0.701 ms

Copy link
Contributor Author

@pCYSl5EDgo pCYSl5EDgo Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that's... really unfortunate and I would encourage folks who want good perf to simply cast their sbyte[] as a byte[] (using pointers) for faster and more compact encoding.

I change the formatter to cast sbyte[] as byte[] and encode it.
This is the benchmark result.

Method Mean Error StdDev
SerializeSimd_ConvertByteArray 11.05 ms 0.206 ms 0.426 ms
SerializeNoSimd 127.54 ms 2.472 ms 2.064 ms
SerializeSimdZero_ConvertByteArray 11.32 ms 0.222 ms 0.247 ms
SerializeNoSimdZero 64.56 ms 1.243 ms 2.830 ms
SerializeSimdM32_ConvertByteArray 10.89 ms 0.183 ms 0.290 ms
SerializeNoSimdM32 63.10 ms 1.178 ms 1.157 ms
SerializeSimdM33_ConvertByteArray 10.77 ms 0.150 ms 0.133 ms
SerializeNoSimdM33 97.38 ms 0.998 ms 0.885 ms

My PR is not that bad compared to that which converts to byte[].

@@ -35,6 +40,581 @@ public byte[] Deserialize(ref MessagePackReader reader, MessagePackSerializerOpt
}
}

public sealed class SByteArrayFormatter : IMessagePackFormatter<sbyte[]>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why use sbyte here? Who uses sbyte[]? If there's a perf improvement to be made in writing out an sbyte[] array, wouldn't that also apply to byte[], which is far more popular?

Copy link
Contributor Author

@pCYSl5EDgo pCYSl5EDgo Aug 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specialized Formatter has already existed for byte[]. I think that formatter is fastest. I don't need to improve that.

Implemented : sbyte[], short[]
Work In Progress : int[], ushort[], uint[]

The reason of sbyte[] is that its implementation difficulty seemed easy to me compared to others.
I get used to implementing improved formatters.
Yes, this is a practice.

I will write int[] formatter which seems more difficult to implement than sbyte[].

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, before you go to any more work, I'd like to feel settled on what you've done so far so you don't waste effort if we're not going to take the PR ultimately anyway. I'm not saying we won't... I'm just saying that since you submitted some, let us review this and understand it enough to justify your continued effort here.

@pCYSl5EDgo
Copy link
Contributor Author

Add double[] formatter.

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1049 (1909/November2018Update/19H2)
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.401
  [Host]  : .NET Core 3.1.7 (CoreCLR 4.700.20.36602, CoreFX 4.700.20.37001), X64 RyuJIT
  LongRun : .NET Core 3.1.7 (CoreCLR 4.700.20.36602, CoreFX 4.700.20.37001), X64 RyuJIT

Job=LongRun  IterationCount=100  LaunchCount=3  
WarmupCount=15  
Method Size Mean Error StdDev Median
SerializeSingleInstructionMultipleData 64 227.3 ns 3.22 ns 16.52 ns 224.0 ns
SerializeNoSingleInstructionMultipleData 64 604.4 ns 5.13 ns 25.96 ns 596.5 ns
SerializeSingleInstructionMultipleData 1024 1,490.5 ns 18.70 ns 93.75 ns 1,463.4 ns
SerializeNoSingleInstructionMultipleData 1024 9,230.7 ns 354.94 ns 1,827.70 ns 8,519.3 ns
SerializeSingleInstructionMultipleData 16777216 142,284,585.5 ns 2,387,494.64 ns 12,208,105.14 ns 137,715,500.0 ns
SerializeNoSingleInstructionMultipleData 16777216 267,093,020.7 ns 4,500,563.82 ns 23,053,582.08 ns 259,778,500.0 ns

@AArnott
Copy link
Collaborator

AArnott commented Sep 12, 2020

I'm re-reviewing, but so far almost all the diffs I see in the existing projects are style changes. Please revert everything that's unrelated to your perf work. You can submit another PR with the style changes if you feel so inclined and we can weigh those separately.

@AArnott
Copy link
Collaborator

AArnott commented Sep 12, 2020

Never mind, I'll take care of it while I'm reviewing.

Also:
* deleted `IntegerArrayFormatterHelper.cs` which the PR had added but seems to not use.
* replaced MessagePack_2_1_165.dll with the one from the nuget package by that version. The one placed here previously was slightly different and I don't know why, but using the official build seems prudent.
Copy link
Collaborator

@AArnott AArnott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for contributing!


using MessagePack.Formatters;

namespace MessagePack.Experimental.Resolvers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use the actual namespace and type names that we would in the primary library, we can eventually move these types from Experimental to the main library without a binary breaking change in the future. Any concerns with dropping Experimental from the namespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have any concerns.
I had forgotten the perspective "binary breaking change in the future".
Thank you for your dropping Experimental!

@AArnott
Copy link
Collaborator

AArnott commented Sep 12, 2020

But IMO we should target 2.2 with this change, which is only days away from being released anyway.

@pCYSl5EDgo
Copy link
Contributor Author

pCYSl5EDgo commented Sep 12, 2020

Thank you for your review, revert and approval!

MessagePack_2_1_165.dll with the one from the nuget package by that version. The one placed here previously was slightly different and I don't know why, but using the official build seems prudent.

I'm worried the benchmark project will not work.
I made the change because I was referring to the similar benchmark project "benchmark/SerializerBenchmark".
"benchmark/SerializerBenchmark" has "MessagePack_1_7_3_6.dll".
"MessagePack_1_7_3_6.dll" is different from Nuget's one.

The differences between "MessagePack_1_7_3_6.dll" and Nuget Official one

  • File name
  • Assembly name
  • Module name

C#'s extern alias official explanation saids that extern alias needs different assembly name.
I made a change to the Nuget one to change its assembly and module name.

@pCYSl5EDgo pCYSl5EDgo changed the base branch from master to v2.2 September 12, 2020 03:52
@pCYSl5EDgo
Copy link
Contributor Author

CI seems to fail in the step of dotnet restore.
There seems to be a problem about the version dependency in MessagePack.Generator.Tests.
CI saids that MessagePack.Generator.Tests depends on v3.4.0 and MessagePack.GeneratorCore depends on v3.6.0.
I read both csproj and do not find 3.6.0 dependency...

I do not know how to fix the CI failure. :(

@AArnott
Copy link
Collaborator

AArnott commented Sep 13, 2020

I'll take a look at the CI break.

Regarding assembly names, it should not be necessary to change the assembly name if the assembly version is already unique, but I will confirm this.

@AArnott
Copy link
Collaborator

AArnott commented Sep 13, 2020

The CI break resolved itself. I suspect you had retargeted to the v2.2 branch before I merged master into v2.2 yesterday, so the build didn't include the fix I included in #1036 for the package restore failure. That also would explain why you didn't see the failure locally: your branch hadn't actually merged with v2.2 either. Anyway, although this PR currently shows a build failure, I rebuilt the PR in Azure Pipelines and it succeeded so if we push to your PR again, it'll work.

Regarding the assembly name, yes the assembly names must be unique, and yours wasn't because you were building in a v2.1 branch and had checked in messagepack with an assembly version of 2.1.0.0, which matched what was built in that branch. So once your change merges with the v2.2 branch and the assembly version changes to 2.2.0.0, it works. So I'm going to push a change to your PR that merges with v2.2 then reverts the custom build of messagepack.dll so it's just the standard one since at that point we won't need the assembly name change.

@AArnott AArnott requested a review from neuecc September 13, 2020 15:41
@AArnott
Copy link
Collaborator

AArnott commented Sep 13, 2020

@neuecc I'm satisfied with this. Are you?

Copy link
Member

@neuecc neuecc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good.

@pCYSl5EDgo
Copy link
Contributor Author

CI break

I appreciate your fix.

Assembly Name

Oh, I'll study it again. Thank you for your explanatation.

@AArnott
Copy link
Collaborator

AArnott commented Sep 14, 2020

@pCYSl5EDgo do you mind if I squash your PR instead of merge it, given it has 47 commits?

@pCYSl5EDgo
Copy link
Contributor Author

@AArnott
I do not mind it. Thank you.

@pCYSl5EDgo pCYSl5EDgo changed the title Add HARDWARE_INTRINSICS_X86 preprocessor symbol to netcoreapp3.1 targeted build Add MessagePack.Experimental package which includes SIMD(Single Instruction Multiple Data) accelerated primitive array formatters. Sep 14, 2020
@AArnott AArnott merged commit bf2ea7a into MessagePack-CSharp:v2.2 Sep 14, 2020
@AArnott
Copy link
Collaborator

AArnott commented Sep 14, 2020

Thanks for your contribution, @pCYSl5EDgo !

@AArnott AArnott added this to the v2.2 milestone Sep 14, 2020
@pCYSl5EDgo pCYSl5EDgo deleted the netcoreapp3_1 branch September 14, 2020 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants