-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Stream ReadAtLeast and ReadExactly #69272
Conversation
Adds methods to Stream to read at least a minimum amount of bytes, or a full buffer, of data from the stream. ReadAtLeast allows for the caller to specify whether an exception should be thrown or not on the end of the stream. Make use of the new methods where appropriate in net7.0. Fix dotnet#16598
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
Tagging subscribers to this area: @dotnet/area-system-io Issue DetailsAdds methods to Stream to read at least a minimum amount of bytes, or a full buffer, of data from the stream. Make use of the new methods where appropriate in net7.0. Fix #16598
|
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipHelper.cs
Show resolved
Hide resolved
src/libraries/System.Net.Http/tests/FunctionalTests/LoopbackSocksServer.cs
Outdated
Show resolved
Hide resolved
int bytesRead; | ||
int offset = 0; | ||
while (offset < buffer.Length) | ||
int bytesRead = await TAdapter.ReadAtLeastAsync( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a risk here. Previously if the async read pends, it would force the ReadMessageAsync state machine to be allocated. Now if the async read pends, it will force the ReadMessageAsync state machine to be allocated and the ReadAtLeastAsync state machine to be allocated.
We should consider using the pooled async state machine builders on the ReadAtLeastAsync async methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this as simple as adding
[AsyncMethodBuilder(typeof(PoolingAsyncValueTaskMethodBuilder))]
to the Stream.ReadAtLeastAsyncCore
method? Or is it more involved than that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation-wise, that's it, easy peasy. The important thing is validating performance. The concern with such pooling is that it can have a negative impact on the larger system, especially if it ends up creating more gen2 to gen0 references (by referencing gen0 objects from pooled objects that are likely to eventually be gen2). But I worry that without doing this, the methods will end up not being used because folks will see the allocation and choose to avoid the helpers as a result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's chat tomorrow about how I can validate the performance here. I'd like to get your thoughts on what to try / look for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this change! Left some comments and questions.
...ries/System.Reflection.Metadata/src/System/Reflection/Internal/Utilities/StreamExtensions.cs
Show resolved
Hide resolved
Thanks for the review so far. I now believe this is fully ready for review. PTAL. |
More specifically, I think we should throw for minimumBytes <= 0. There are two main situations where someone might be passing 0 to a read method:
Read is available for the former. For the latter, a direct call to ReadAtLeast with a specified lower bound of 0 doesn't seem likely, given that this is expected to be used in scenarios like protocol parsing where you're expecting some number of bytes necessary to make forward progress, and for ReadExactly, it'll typically be used in situations where you're trying to read everything in the payload into some buffer you have... doing math is much more common with Read, where you might need to loop, and we wouldn't expect much looping around ReadExactly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on responses provided by @stephentoub it LGTM. Thanks!
ReadAtLeast throws when minimumBytes is not positive. ReadExactly throws when an empty buffer is passed, or count is not positive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some documentation suggestions.
src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/ValueTask.cs
Outdated
Show resolved
Hide resolved
How would My understanding is that this call would currently throw. I'm not sure why there is a need to create a special case where, naturally, no special case is necessary. @stephentoub mentioned that this can happen if "buffer math" results in a zero-length read. I don't see this as a bad thing. It can be an outcome of natural code patterns. |
Because otherwise it could hang until there's data available. If you want that behavior, Read already exists and provides everything needed. In your example scenario, it's almost certainly not the behavior you want. An alternative is special-casing the new methods to handle 0 by always returning immediately rather than throwing. |
Oh, now I see. If we have a zero-length read, this would cause many streams to wait until more data is incoming! That's a bug, and I'm afraid I will now have to check some of my code for this particular mistake 😁. Thanks for explaining. But that's actually an argument to add that special case check. That seems to fit the use-case perfectly. Without that special case, many callers will need to special case this themselves. I have done quite a bit of protocol work, and zero-length data is not uncommon. |
@eerhardt, what would you think of this? |
I don't really like having different behavior between At a high-level, I can think of the following possibilities:
My current thinking is that 3(b) seems to be the most consistent with |
I don't think that's desirable. It means a scenario like: int numBytesInPayload = ReadPayloadSize(stream);
byte[] payload = new byte[numBytesInPayload];
stream.ReadExactly(payload); could block indefinitely even when the payload has been fully and successfully read. I do think 0 is special for ReadExactly and ReadAtLeast and needs to be treated differently from what Read does. We could choose to throw (1) or return immediately (2), but calling Read with 0 (3) is problematic. If someone were using their own read loop for this, they would be special-casing 0, e.g. int numBytesInPayload = ReadPayloadSize(stream);
byte[] payload = new byte[numBytesInPayload];
int totalRead = 0;
while (totalRead < numBytesInPayload)
{
int n = stream.Read(payload.AsSpan(totalRead));
if (n == 0) throw ...;
totalRead += n;
} such that stream.Read would never be called if numBytesInPayload were 0. We want them to be able to switch to do that instead with ReadExactly, but with any variant of (3), they'd have to special-case 0 themselves: if (payload != 0)
{
stream.ReadExactly(payload);
} Now, maybe we'd want them to do that anyway in this particular example, in order to prefer using Is there a use case where someone would be using ReadExactly or ReadAtLeast, passing in 0, and wanting the "wait until data is available" behavior that stream.Read might provide? |
Are there two different scenarios here?
In case the non-empty buffer case, would you expect no |
That's part of my question... what is the scenario for calling ReadAtLeast with a non-zero-length buffer and minimumBytes == 0? |
Maybe the behavior split makes sense between the new methods then?
I just wouldn't expect |
Actually, I just answered my own question ("I'm not clear on in what situation you'd say "please read data but if you feel like not giving any back that's cool"). Consider the same scenario I outlined earlier, but where the protocol parser is maintaining its own buffer. It wants to read the payload but also any additional data that might be available, in order to avoid unnecessary additional calls for when it wants more data later. That's what ReadAtLeast is for, and you wouldn't want it to throw for 0 (0 is valid) nor block waiting for more (because the 0 was satisfied). Allowing 0 to just nop means ReadAtLeast works for non-0-length in this manner, giving back as much as it can, while still enabling 0 to work without throwing or blocking. Stream.Read special-cases 0 (on some streams). ReadExactly and ReadAtLeast serve different purposes than does Read, and since they're implemented in terms of it (effectively as an implementation detail), they then also need to special-case 0. I think it makes sense for them to special-case it to mean "I've read what you asked and now I'm done". It would be nice if ReadAtLeast could give more back if more was available, but to do so it would need to call Read with 0, which could block. |
Why? It's in the name "at least"... when it has at least 0, it's done. |
OK, I can make this change.
This might be a case for making |
…Bytes == 0. Both of these cases are a no-op. No exception is thrown. A read won't be issued to the underlying stream. They won't block until data is available.
dotnet#69272 changed DCS to no longer call Stream.Read inside a loop, but instead call the new ReadAtLeast method. ReadAtLeast only takes a Span<byte>, and not a byte[]. This caused a regression because the internal encoding stream wrapper classes don't override Read(Span<byte>), so the base implementation is used. The base implementation is slower because it needs to rent a byte[] from the pool, and do 2 copies. Overriding Read(Span<byte>) on the internal encoding stream wrapper classes allows ReadAtLeast to be just as fast. Fix dotnet#69730
) * Fix Stream.ReadAtLeast perf regression in DataContractSerializer #69272 changed DCS to no longer call Stream.Read inside a loop, but instead call the new ReadAtLeast method. ReadAtLeast only takes a Span<byte>, and not a byte[]. This caused a regression because the internal encoding stream wrapper classes don't override Read(Span<byte>), so the base implementation is used. The base implementation is slower because it needs to rent a byte[] from the pool, and do 2 copies. Overriding Read(Span<byte>) on the internal encoding stream wrapper classes allows ReadAtLeast to be just as fast. Fix #69730 * Add Span overloads for Stream Read and Write implementations that don't currently override the Span overloads.
Adds methods to Stream to read at least a minimum amount of bytes, or a full buffer, of data from the stream.
ReadAtLeast allows for the caller to specify whether an exception should be thrown or not on the end of the stream.
Make use of the new methods where appropriate in net7.0.
Fix #16598