Add Stream ReadAtLeast and ReadExactly #69272

eerhardt · 2022-05-12T19:23:36Z

Adds methods to Stream to read at least a minimum amount of bytes, or a full buffer, of data from the stream.
ReadAtLeast allows for the caller to specify whether an exception should be thrown or not on the end of the stream.

Make use of the new methods where appropriate in net7.0.

Fix #16598

Adds methods to Stream to read at least a minimum amount of bytes, or a full buffer, of data from the stream. ReadAtLeast allows for the caller to specify whether an exception should be thrown or not on the end of the stream. Make use of the new methods where appropriate in net7.0. Fix dotnet#16598

dotnet-issue-labeler · 2022-05-12T19:23:40Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

ghost · 2022-05-12T19:23:43Z

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Adds methods to Stream to read at least a minimum amount of bytes, or a full buffer, of data from the stream.
ReadAtLeast allows for the caller to specify whether an exception should be thrown or not on the end of the stream.

Make use of the new methods where appropriate in net7.0.

Fix #16598

Author:	eerhardt
Assignees:	-
Labels:	`area-System.IO`, `new-api-needs-documentation`
Milestone:	-

src/libraries/System.Private.CoreLib/src/System/IO/Stream.cs

src/libraries/System.IO.Compression/src/System/IO/Compression/ZipHelper.cs

src/libraries/System.Net.Http/tests/FunctionalTests/LoopbackSocksServer.cs

stephentoub · 2022-05-12T20:22:00Z

src/libraries/System.Net.Security/src/System/Net/StreamFramer.cs

-            int bytesRead;
-            int offset = 0;
-            while (offset < buffer.Length)
+            int bytesRead = await TAdapter.ReadAtLeastAsync(


There is a risk here. Previously if the async read pends, it would force the ReadMessageAsync state machine to be allocated. Now if the async read pends, it will force the ReadMessageAsync state machine to be allocated and the ReadAtLeastAsync state machine to be allocated.

We should consider using the pooled async state machine builders on the ReadAtLeastAsync async methods.

Is this as simple as adding

[AsyncMethodBuilder(typeof(PoolingAsyncValueTaskMethodBuilder))]

to the Stream.ReadAtLeastAsyncCore method? Or is it more involved than that?

Implementation-wise, that's it, easy peasy. The important thing is validating performance. The concern with such pooling is that it can have a negative impact on the larger system, especially if it ends up creating more gen2 to gen0 references (by referencing gen0 objects from pooled objects that are likely to eventually be gen2). But I worry that without doing this, the methods will end up not being used because folks will see the allocation and choose to avoid the helpers as a result.

Let's chat tomorrow about how I can validate the performance here. I'd like to get your thoughts on what to try / look for.

src/libraries/System.Private.CoreLib/src/System/IO/Stream.cs

carlossanlop

Thanks for this change! Left some comments and questions.

src/libraries/System.Private.CoreLib/src/System/IO/Stream.cs

src/libraries/System.IO/tests/Stream/Stream.ReadExactly.cs

...ries/System.Reflection.Metadata/src/System/Reflection/Internal/Utilities/StreamExtensions.cs

src/libraries/System.Private.CoreLib/src/System/IO/Stream.cs

eerhardt · 2022-05-13T22:19:22Z

Thanks for the review so far. I now believe this is fully ready for review. PTAL.

src/libraries/System.IO/tests/Stream/Stream.ReadAtLeast.cs

src/libraries/System.Private.CoreLib/src/System/IO/Stream.cs

stephentoub · 2022-05-17T13:31:21Z

My personal preference would be to throw for empty buffers

More specifically, I think we should throw for minimumBytes <= 0.

There are two main situations where someone might be passing 0 to a read method:

Want the "wait for data to be available" behavior
Doing math to compute how much to read and not special-casing 0

Read is available for the former. For the latter, a direct call to ReadAtLeast with a specified lower bound of 0 doesn't seem likely, given that this is expected to be used in scenarios like protocol parsing where you're expecting some number of bytes necessary to make forward progress, and for ReadExactly, it'll typically be used in situations where you're trying to read everything in the payload into some buffer you have... doing math is much more common with Read, where you might need to loop, and we wouldn't expect much looping around ReadExactly.

adamsitnik

Based on responses provided by @stephentoub it LGTM. Thanks!

ReadAtLeast throws when minimumBytes is not positive. ReadExactly throws when an empty buffer is passed, or count is not positive.

carlossanlop

Left some documentation suggestions.

src/libraries/System.Private.CoreLib/src/System/IO/Stream.cs

src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/ValueTask.cs

GSPP · 2022-05-18T10:30:02Z

How would ReadExactly be used to read a length-prefixed binary value from some protocol? A zero-length binary data is no special case and should just work.

My understanding is that this call would currently throw. I'm not sure why there is a need to create a special case where, naturally, no special case is necessary.

@stephentoub mentioned that this can happen if "buffer math" results in a zero-length read. I don't see this as a bad thing. It can be an outcome of natural code patterns.

stephentoub · 2022-05-18T10:34:39Z

I'm not sure why there is a need to create a special case

Because otherwise it could hang until there's data available. If you want that behavior, Read already exists and provides everything needed. In your example scenario, it's almost certainly not the behavior you want.

An alternative is special-casing the new methods to handle 0 by always returning immediately rather than throwing.

GSPP · 2022-05-18T10:45:16Z

Oh, now I see. If we have a zero-length read, this would cause many streams to wait until more data is incoming! That's a bug, and I'm afraid I will now have to check some of my code for this particular mistake 😁. Thanks for explaining.

But that's actually an argument to add that special case check. That seems to fit the use-case perfectly.

Without that special case, many callers will need to special case this themselves. I have done quite a bit of protocol work, and zero-length data is not uncommon.

stephentoub · 2022-05-18T11:01:50Z

An alternative is special-casing the new methods to handle 0 by always returning immediately

@eerhardt, what would you think of this?

eerhardt · 2022-05-18T13:45:27Z

I don't really like having different behavior between Read(buffer, 0, 0) and ReadAtLeast(buffer, 0). My first proposal was to make these behave roughly equivalent - block until data is available. Passing in an empty buffer into both ReadAtLeast and ReadExactly became a problem though, since 0 was returned from the Read operation, and that looked like an end-of-stream.

At a high-level, I can think of the following possibilities:

Check up front and throw an exception. (current PR behavior)
Return immediately, don't do a "read", even though a "read" method was called.
Always call Read, with 2 options on when an empty buffer is passed:
a. Throw an exception
b. Special case an empty buffer, and allow for 0 to be returned from Read. Return successfully for ReadExactly and 0 for ReadAtLeast.

My current thinking is that 3(b) seems to be the most consistent with Stream.Read, and would be reasonable behavior.

stephentoub · 2022-05-18T13:49:35Z

My current thinking is that 3(b) seems to be the most consistent with Stream.Read, and would be reasonable behavior.

I don't think that's desirable. It means a scenario like:

int numBytesInPayload = ReadPayloadSize(stream);
byte[] payload = new byte[numBytesInPayload];
stream.ReadExactly(payload);

could block indefinitely even when the payload has been fully and successfully read.

I do think 0 is special for ReadExactly and ReadAtLeast and needs to be treated differently from what Read does. We could choose to throw (1) or return immediately (2), but calling Read with 0 (3) is problematic.

If someone were using their own read loop for this, they would be special-casing 0, e.g.

int numBytesInPayload = ReadPayloadSize(stream);
byte[] payload = new byte[numBytesInPayload];
int totalRead = 0;
while (totalRead < numBytesInPayload)
{
    int n = stream.Read(payload.AsSpan(totalRead));
    if (n == 0) throw ...;
    totalRead += n;
}

such that stream.Read would never be called if numBytesInPayload were 0. We want them to be able to switch to do that instead with ReadExactly, but with any variant of (3), they'd have to special-case 0 themselves:

if (payload != 0)
{
    stream.ReadExactly(payload);
}

Now, maybe we'd want them to do that anyway in this particular example, in order to prefer using Array.Empty<byte>() in that case. But that's an optimization, and this API is meant to be the easy button.

Is there a use case where someone would be using ReadExactly or ReadAtLeast, passing in 0, and wanting the "wait until data is available" behavior that stream.Read might provide?

eerhardt · 2022-05-18T15:27:15Z

Are there two different scenarios here?

ReadExactly and ReadAtLeast with an empty buffer (which implies minimumBytes must be 0).
ReadAtLeast with a non-empty buffer, but passing in minimumBytes = 0.

In case the non-empty buffer case, would you expect no Read to happen?

stephentoub · 2022-05-18T15:30:24Z

Are there two different scenarios here?

That's part of my question... what is the scenario for calling ReadAtLeast with a non-zero-length buffer and minimumBytes == 0?
Why wouldn't you just use Read? I'm not clear on in what situation you'd say "please read data but if you feel like not giving any back that's cool".

eerhardt · 2022-05-18T15:34:46Z

Maybe the behavior split makes sense between the new methods then?

ReadExactly with an empty buffer (or count == 0 for the array overload) - no-ops
ReadAtLeast with minimumBytes == 0 - throws

I just wouldn't expect ReadAtLeast to no-op. It feels closer to Read, and isn't the easy button. If we don't have a scenario for ReadAtLeast(buffer, 0), maybe it makes the most sense to throw.

stephentoub · 2022-05-18T15:41:37Z

Actually, I just answered my own question ("I'm not clear on in what situation you'd say "please read data but if you feel like not giving any back that's cool"). Consider the same scenario I outlined earlier, but where the protocol parser is maintaining its own buffer. It wants to read the payload but also any additional data that might be available, in order to avoid unnecessary additional calls for when it wants more data later. That's what ReadAtLeast is for, and you wouldn't want it to throw for 0 (0 is valid) nor block waiting for more (because the 0 was satisfied). Allowing 0 to just nop means ReadAtLeast works for non-0-length in this manner, giving back as much as it can, while still enabling 0 to work without throwing or blocking.

Stream.Read special-cases 0 (on some streams). ReadExactly and ReadAtLeast serve different purposes than does Read, and since they're implemented in terms of it (effectively as an implementation detail), they then also need to special-case 0. I think it makes sense for them to special-case it to mean "I've read what you asked and now I'm done". It would be nice if ReadAtLeast could give more back if more was available, but to do so it would need to call Read with 0, which could block.

stephentoub · 2022-05-18T15:42:19Z

I just wouldn't expect ReadAtLeast to no-op

Why? It's in the name "at least"... when it has at least 0, it's done.

eerhardt · 2022-05-18T16:06:17Z

OK, I can make this change.

It would be nice if ReadAtLeast could give more back if more was available, but to do so it would need to call Read with 0, which could block.

This might be a case for making ReadAtLeast virtual - so a derived stream could give back data if it was already available, without blocking. But I don't see it as a "need", and we can add it in the future if necessary.

…Bytes == 0. Both of these cases are a no-op. No exception is thrown. A read won't be issued to the underlying stream. They won't block until data is available.

dotnet#69272 changed DCS to no longer call Stream.Read inside a loop, but instead call the new ReadAtLeast method. ReadAtLeast only takes a Span<byte>, and not a byte[]. This caused a regression because the internal encoding stream wrapper classes don't override Read(Span<byte>), so the base implementation is used. The base implementation is slower because it needs to rent a byte[] from the pool, and do 2 copies. Overriding Read(Span<byte>) on the internal encoding stream wrapper classes allows ReadAtLeast to be just as fast. Fix dotnet#69730

) * Fix Stream.ReadAtLeast perf regression in DataContractSerializer #69272 changed DCS to no longer call Stream.Read inside a loop, but instead call the new ReadAtLeast method. ReadAtLeast only takes a Span<byte>, and not a byte[]. This caused a regression because the internal encoding stream wrapper classes don't override Read(Span<byte>), so the base implementation is used. The base implementation is slower because it needs to rent a byte[] from the pool, and do 2 copies. Overriding Read(Span<byte>) on the internal encoding stream wrapper classes allows ReadAtLeast to be just as fast. Fix #69730 * Add Span overloads for Stream Read and Write implementations that don't currently override the Span overloads.

dotnet-issue-labeler bot added area-System.IO new-api-needs-documentation labels May 12, 2022

ghost assigned eerhardt May 12, 2022

campersau reviewed May 12, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/IO/Stream.cs Outdated Show resolved Hide resolved

Fix build and copy-paste error

fed030a

stephentoub reviewed May 12, 2022

View reviewed changes

src/libraries/System.IO.Compression/src/System/IO/Compression/ZipHelper.cs Show resolved Hide resolved

stephentoub reviewed May 12, 2022

View reviewed changes

src/libraries/System.Net.Http/tests/FunctionalTests/LoopbackSocksServer.cs Outdated Show resolved Hide resolved

stephentoub reviewed May 12, 2022

View reviewed changes

am11 reviewed May 12, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/IO/Stream.cs Show resolved Hide resolved

stephentoub reviewed May 12, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/IO/Stream.cs Outdated Show resolved Hide resolved

stephentoub reviewed May 12, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/IO/Stream.cs Show resolved Hide resolved

eerhardt added 4 commits May 12, 2022 17:37

Fix WebSockets tests

f3e94ac

Add ReadAtLeast and ReadExactly unit tests

eae2070

Fix System.Net.Security tests

ec9a21a

Respond to PR feedback.

c2c74c3

stephentoub reviewed May 13, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/IO/Stream.cs Outdated Show resolved Hide resolved

carlossanlop reviewed May 13, 2022

View reviewed changes

eerhardt added 5 commits May 13, 2022 07:05

Fix test build in CI

7a30272

Respond to PR feedback

35b3483

Fix System.Xml.Tests

7801632

Add XML docs to the new APIs

444316e

Add message for new exception.

09bcf22

eerhardt marked this pull request as ready for review May 13, 2022 22:18

eerhardt commented May 14, 2022

View reviewed changes

src/libraries/System.IO/tests/Stream/Stream.ReadAtLeast.cs Outdated Show resolved Hide resolved

davidfowl reviewed May 14, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/IO/Stream.cs Show resolved Hide resolved

carlossanlop mentioned this pull request May 17, 2022

Tar APIs pending feedback to address #68230

Open

54 tasks

adamsitnik approved these changes May 17, 2022

View reviewed changes

eerhardt added 2 commits May 17, 2022 15:43

PR feedback

6adb1d4

ReadAtLeast throws when minimumBytes is not positive. ReadExactly throws when an empty buffer is passed, or count is not positive.

Fix code to check for ReadAtLeast(0)

754d7e5

carlossanlop reviewed May 17, 2022

View reviewed changes

eerhardt added 2 commits May 17, 2022 19:15

Xml doc feedback

0c114bb

Preserve behavior in StreamReader.FillBuffer when passed 0.

568c57d

carlossanlop approved these changes May 18, 2022

View reviewed changes

Handle ReadExactly with an empty buffer, and ReadAtLeast with minimum…

7db5707

…Bytes == 0. Both of these cases are a no-op. No exception is thrown. A read won't be issued to the underlying stream. They won't block until data is available.

eerhardt merged commit de27aa1 into dotnet:main May 20, 2022

eerhardt deleted the StreamReadAtLeast branch May 20, 2022 14:20

kunalspathak mentioned this pull request May 24, 2022

Regressions in MicroBenchmarks.Serializers #69730

Closed

eerhardt mentioned this pull request May 26, 2022

Fix Stream.ReadAtLeast perf regression in DataContractSerializer #69879

Merged

kunalspathak mentioned this pull request May 31, 2022

Regressions in MicroBenchmarks.Serializers.Xml_FromStream #70030

Closed

dakersnar mentioned this pull request Jun 10, 2022

.NET 7.0 Preview 5 Microbenchmarks Performance Study Report #70590

Closed

14 tasks

ghost locked as resolved and limited conversation to collaborators Jun 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Stream ReadAtLeast and ReadExactly #69272

Add Stream ReadAtLeast and ReadExactly #69272

eerhardt commented May 12, 2022

dotnet-issue-labeler bot commented May 12, 2022

ghost commented May 12, 2022

stephentoub May 12, 2022

eerhardt May 13, 2022

stephentoub May 13, 2022

eerhardt May 13, 2022

carlossanlop left a comment

eerhardt commented May 13, 2022

stephentoub commented May 17, 2022 •

edited

Loading

adamsitnik left a comment

carlossanlop left a comment

GSPP commented May 18, 2022 •

edited

Loading

stephentoub commented May 18, 2022 •

edited

Loading

GSPP commented May 18, 2022 •

edited

Loading

stephentoub commented May 18, 2022

eerhardt commented May 18, 2022 •

edited

Loading

stephentoub commented May 18, 2022 •

edited

Loading

eerhardt commented May 18, 2022

stephentoub commented May 18, 2022 •

edited

Loading

eerhardt commented May 18, 2022 •

edited

Loading

stephentoub commented May 18, 2022 •

edited

Loading

stephentoub commented May 18, 2022

eerhardt commented May 18, 2022

Add Stream ReadAtLeast and ReadExactly #69272

Add Stream ReadAtLeast and ReadExactly #69272

Conversation

eerhardt commented May 12, 2022

dotnet-issue-labeler bot commented May 12, 2022

ghost commented May 12, 2022

stephentoub May 12, 2022

Choose a reason for hiding this comment

eerhardt May 13, 2022

Choose a reason for hiding this comment

stephentoub May 13, 2022

Choose a reason for hiding this comment

eerhardt May 13, 2022

Choose a reason for hiding this comment

carlossanlop left a comment

Choose a reason for hiding this comment

eerhardt commented May 13, 2022

stephentoub commented May 17, 2022 • edited Loading

adamsitnik left a comment

Choose a reason for hiding this comment

carlossanlop left a comment

Choose a reason for hiding this comment

GSPP commented May 18, 2022 • edited Loading

stephentoub commented May 18, 2022 • edited Loading

GSPP commented May 18, 2022 • edited Loading

stephentoub commented May 18, 2022

eerhardt commented May 18, 2022 • edited Loading

stephentoub commented May 18, 2022 • edited Loading

eerhardt commented May 18, 2022

stephentoub commented May 18, 2022 • edited Loading

eerhardt commented May 18, 2022 • edited Loading

stephentoub commented May 18, 2022 • edited Loading

stephentoub commented May 18, 2022

eerhardt commented May 18, 2022

stephentoub commented May 17, 2022 •

edited

Loading

GSPP commented May 18, 2022 •

edited

Loading

stephentoub commented May 18, 2022 •

edited

Loading

GSPP commented May 18, 2022 •

edited

Loading

eerhardt commented May 18, 2022 •

edited

Loading

stephentoub commented May 18, 2022 •

edited

Loading

stephentoub commented May 18, 2022 •

edited

Loading

eerhardt commented May 18, 2022 •

edited

Loading

stephentoub commented May 18, 2022 •

edited

Loading