-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Intrinsify UTF16->UTF8 conversion for string literals (Encoding.UTF8.TryGetBytes) #85328
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsCloses #84659 Benchmark:byte[] _outputBuffer = new byte[1024];
[Benchmark]
public int TryGetBytes_2B()
{
Encoding.UTF8.TryGetBytes("Hi", _outputBuffer.AsSpan(), out int written);
return written;
}
[Benchmark]
public int TryGetBytes_27B()
{
Encoding.UTF8.TryGetBytes("MemoryMarshal.GetReference", _outputBuffer.AsSpan(), out int written);
return written;
}
[Benchmark]
public int TryGetBytes_122B()
{
Encoding.UTF8.TryGetBytes(
".NET Runtime uses third-party libraries or other resources" +
"that may be distributed under licenses different than the .NET",
_outputBuffer.AsSpan(), out int written);
return written;
}
With codegen diff for ;; length check
vmovups zmm0, zmmword ptr [reloc @RWD00]
vmovups zmmword ptr [r9], zmm0
vmovups zmm0, zmmword ptr [reloc @RWD64]
vmovups zmmword ptr [r9+38H], zmm0
mov eax, 120 ; bytes written
RWD00 dq 6E75522054454E2Eh, 65737520656D6974h, 2D64726968742073h, 696C207974726170h, 2073656972617262h, 726568746F20726Fh, 6372756F73657220h, 6D20746168747365h
RWD64 dq 6D20746168747365h, 6964206562207961h, 6574756269727473h, 207265646E752064h, 7365736E6563696Ch, 6572656666696420h, 206E61687420746Eh, 54454E2E20656874h in this case - via two avx-512 loads from the data section. For now it needs DynamicPGO being enabled to workaround #85209, I'll land a fix for separately (or a workaround). tldr: None of cc @stephentoub TODO:
|
@jakobbotsch @SingleAccretion yet another I recommend to enable "ignore whitespaces" mode. There were a couple of "refactorings" + a pretty simple expansion for |
@EgorBo Can you merge and rerun CI? |
Co-authored-by: Jakob Botsch Nielsen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JIT changes look good to me.
@stephentoub It's tricky to implement but it should be doable. Calling this static method directly from interpolation should be a better solution because we won't have to rely on inlining just like Jan pointed out earlier, so I guess for now I'll remove its usage from |
…time-1 into intrinsify-utf16-utf8-literal
Ok, sounds good. |
/azp run runtime-coreclr pgo |
No commit pushedDate could be found for PR 85328 in repo dotnet/runtime |
@EgorBo, anything I can do to help land this? |
I was thinking of filing a separate PR to improve |
Yes, I thought that's what we agreed on above, using it from AppendLiteral and not from TryGetBytes yet. |
Merging then, I've verified that the API works as expected when it's used inside |
I'll submit a pr. Thanks. |
uint16_t ch = bufferU16[charIndex]; | ||
if (ch > 127) | ||
{ | ||
// Only ASCII is supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
Closes #84659
The main goal is to make upcoming UTF8 string interpolation faster (to be on par with UTF16 one). If JIT sees that the source is a string literal (can be extended to cover RVA
ROS<char>
, "shifted" pointers to literal literals,static readonly string
) and the data in it is ASCII only - it peforms the conversion in the JIT time and just copies it to the destination.Benchmark:
cc @stephentoub