-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix StreamReader EOF handling and improve perf #69888
Conversation
Tagging subscribers to this area: @dotnet/area-system-io Issue DetailsThis fixes a bug in For a concrete example, consider the below sample. using System.IO;
using System.Text;
var reader = new StreamReader(
new MemoryStream(new byte[] { 0xF0 }),
new UTF8Encoding(
encoderShouldEmitUTF8Identifier: false,
throwOnInvalidBytes: true));
Console.WriteLine(reader.ReadToEnd()); This prints an empty string to the console, even though it should throw I also took this opportunity to tighten the This results in an approx. 50% throughput increase in ReadLine and ReadLineAsync, as shown in the below benchmarks. The benchmarks also show a decrease in overall
// In the below benchmarks, _ms is a MemoryStream whose contents have been initialized from:
// https://www.gutenberg.org/files/11/11-0.txt
[Benchmark]
public int GetLineCount()
{
_ms.Position = 0;
StreamReader reader = new StreamReader(_ms);
int lineCount = 0;
while (reader.ReadLine() != null) { lineCount++; }
return lineCount;
}
[Benchmark]
public int GetLineCountAsync()
{
_ms.Position = 0;
StreamReader reader = new StreamReader(_ms);
int lineCount = 0;
while (reader.ReadLineAsync().GetAwaiter().GetResult() != null) { lineCount++; }
return lineCount;
} /cc @dotnet/area-system-io
|
src/libraries/System.Private.CoreLib/src/System/IO/StreamReader.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/IO/StreamReader.cs
Outdated
Show resolved
Hide resolved
@GrabYourPitchforks do you expect to have time to continue this PR? I see that #62552 by @Trapov. Is much work remaining? @stephentoub how important is it that we get these PR's into .NET 7? |
It'd be valuable to get this PR for .NET 7, but we wouldn't block the release on it 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I will close-reopen it to retrigger CI.
CI issues are related:
|
@GrabYourPitchforks, I'm planning to take over this PR. Let me know if you'd prefer I not. Thanks. |
3c90a07
to
4a1b30c
Compare
4a1b30c
to
5638093
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you both @GrabYourPitchforks and @stephentoub !
@stephentoub is there any chance you could run these benchmarks and share the results before we hit the merge button?
Buffer.BlockCopy(_byteBuffer, n, _byteBuffer, 0, _byteLen - n); | ||
byte[] byteBuffer = _byteBuffer; | ||
_ = byteBuffer.Length; // allow JIT to prove object is not null | ||
new ReadOnlySpan<byte>(byteBuffer, n, _byteLen - n).CopyTo(byteBuffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to the changes you have made: I can see that once the encoding is detected, this method is being called:
runtime/src/libraries/System.Private.CoreLib/src/System/IO/StreamReader.cs
Lines 491 to 493 in ff4b245
// Big Endian Unicode | |
_encoding = Encoding.BigEndianUnicode; | |
CompressBuffer(2); |
So for the default buffer size and Unicode:
private const int DefaultBufferSize = 1024; // Byte buffer size |
we copy 1022 bytes in place.
Why don't we just update _byteLen
and _bytePos
? I know it would require changing some other parts of the code that rely on the assumption that _bytePos == 0
after the read:
_charLen = _decoder.GetChars(_byteBuffer, 0, _byteLen, _charBuffer, 0, flush: false); |
I am asking this question because this PR improves the performance of DetectEncoding
by moving rarely called code out of hot path and it seems like another perf improvement we could make here while we are at it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephentoub just checking you saw this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did. It was preexisting this PR and my goal was just to land this PR. Anyone is welcome to follow up on the comment.
|
Windows arm64 improvements: dotnet/perf-autofiling-issues#9689 |
This fixes a bug in
StreamReader
where it doesn't properly pass flush: true to the backingDecoder
instance when EOF is reached. This could result in cases where partial data in the decoder buffer is never processed.For a concrete example, consider the below sample.
This prints an empty string to the console, even though it should throw
DecoderFallbackException
since the caller explicitly requested that they want to reject invalid data.I also took this opportunity to tighten the
ReadLine
andReadLineAsync
methods, including usingStringBuilderCache
. This shouldn't blow the cache because we expect lines to be 80 - 100 chars at the high end, which is well within whatStringBuilderCache
can handle.This results in an approx. 50% throughput increase in ReadLine and ReadLineAsync, as shown in the below benchmarks. The benchmarks also show a decrease in overall
StringBuilder
allocations./cc @dotnet/area-system-io