Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a version of Stream.Read that reads as much as possible #16598

Closed
Tracked by #64596
svick opened this issue Mar 4, 2016 · 32 comments · Fixed by #69272
Closed
Tracked by #64596

Provide a version of Stream.Read that reads as much as possible #16598

svick opened this issue Mar 4, 2016 · 32 comments · Fixed by #69272
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.IO blocking Marks issues that we want to fast track in order to unblock other important work
Milestone

Comments

@svick
Copy link
Contributor

svick commented Mar 4, 2016

Background and motivation

One of the most common mistakes when using Stream.Read() is that the programmer doesn't realize that Read() may return less data than what is available in the Stream and less data than the buffer being passed in. And even for programmers who are aware of this, having to write the same loop every single time they want to read from a Stream is annoying.

With the .NET 6 breaking change: Partial and zero-byte reads in DeflateStream, GZipStream, and CryptoStream, it has become apparent that a Stream.Read API that ensures you get at least n bytes read is valuable.

API Proposal

namespace System.IO
{
    public class Stream
    {
+        public int ReadAtLeast(Span<byte> buffer, int minimumBytes, bool throwOnEndOfStream = true);
+        public ValueTask<int> ReadAtLeastAsync(Memory<byte> buffer, int minimumBytes, bool throwOnEndOfStream = true, CancellationToken cancellationToken = default);
    }
}
  • ReadAtLeast will return the number of bytes read into buffer.
  • When throwOnEndOfStream == true, the return value will always be minimumBytes <= bytesRead <= buffer.Length. If the end of the stream is detected, an EndOfStreamException will be thrown.
  • When throwOnEndOfStream == false, the return value will always be bytesRead <= buffer.Length. Callers can check for end of the stream by checking bytesRead < minimumBytes.

API Usage

Example 1

Stream stream = ...;
Span<byte> buffer = ...;

stream.ReadAtLeast(buffer, buffer.Length);
// Do something with the bytes in `buffer`.

Example 2

Stream stream = ...;
Span<byte> buffer = ...;

// make sure we have at least 4 bytes
const int bytesToRead = 4;
int read = stream.ReadAtLeast(buffer, bytesToRead, throwOnEndOfStream: false);
if (read < bytesToRead)
{
    // Handle an early end of the stream. E.g. throw your own exception, set a flag, etc.
}
else
{
    // Do something with the bytes in `buffer`
    // `read` may be more than 4
}

Alternative Designs

  1. We could add a convenience wrapper (ReadAll or Fill) that doesn't take int minimumBytes, and uses buffer.Length as the minimumBytes.

    1. This can be accomplished by simply passing in buffer.Length
    2. The APIs wouldn't be overloads since the names (ReadAtLeast and ReadAll) wouldn't match. I can't think of a decent common name that would work for both operations.
    3. We can always add it later, if we get enough feedback that it is necessary.
  2. There is a question of whether these methods should be virtual or not. From scanning the Stream implementations in dotnet/runtime, I don't see anything special a Stream could do for ReadAtLeast. They already get passed the buffer length, if they want a hint of how much data is being requested.

    1. We can always add it later, if we get enough feedback that it is necessary.
    2. The one place I did find that could possibly override ReadAtLeast is in PipeReader.AsStream()'s implementation. Since PipeReader has a ReadAtLeastAsync API, the PipeReaderStream could override ReadAtLeastAsync and forward the call directly to PipeReader.ReadAtLeastAsync. However, there is no synchronous API on PipeReader, so it would just be implemented for the async API.

Original Proposal

I believe one of the most common mistakes when using Stream.Read() is that the programmer doesn't realize that Read() may return less data than what is available in the Stream. And even for programmers who are aware of this, having to write the same loop every single time they want to read from a Stream is annoying.

So, my proposal is to add the following methods to Stream (mirroring the existing Read methods):

public int ReadAll(byte[] buffer, int offset, int count);
public Task<int> ReadAllAsync(byte[] buffer, int offset, int count);
public Task<int> ReadAllAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken);

Each ReadAll method would call the corresponding Read method in a loop, until the buffer was filled or until Read returned 0.

Questions:

  • Is there a better name than ReadAll?
  • These methods would work as well if they were extension methods. Should they be?
  • Very similar functionality is already exposed through BinaryReader.ReadBytes(). Why doesn't it have async version?
@eadordzhiev
Copy link
Contributor

+1 for async version of BinaryReader

@davidsh
Copy link
Contributor

davidsh commented Mar 7, 2016

While .NET Stream.Read does have the behavior you mention, the WinRT IInputStream.ReadAsync method actually takes a enum parameter that can vary the behavior from "return as quick as possible" to "read as much as possible filling the buffer":

https://msdn.microsoft.com/en-us/library/hh438388(v=vs.85).aspx

InputStreamOptions:
https://msdn.microsoft.com/en-us/library/windows/apps/windows.storage.streams.inputstreamoptions.aspx

None 0 No options are specified.
Partial 1 The asynchronous read operation completes when one or more bytes is available.
ReadAhead 2 The asynchronous read operation may optionally read ahead and prefetch additional bytes.

So, this is an example of an API pattern that might be useful in some way for .NET.

@svick
Copy link
Contributor Author

svick commented Mar 7, 2016

@davidsh Interesting. Though I think that having a guarantee that everything that could be read was read is more useful than what ReadAhead is documented to do.

@JonHanna
Copy link
Contributor

JonHanna commented Mar 7, 2016

@svick though we can provide that guarantee to ourselves right now (and already have the greater guarantee of a given number of transcoded characters when going through a StreamReader), but not give the options @davidsh mentions. I'd rather gain that new ability than an extra wrapper method, if I could pick only one.

@terrajobst
Copy link
Member

terrajobst commented Oct 18, 2016

We see two options:

First one:

public int ReadRequestedCount(byte[] buffer, int offset, int count);
public Task<int> ReadRequestedCountAsync(byte[] buffer, int offset, int count);
public Task<int> ReadRequestedCountAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken);

Second one:

enum StreamOptions
{
    Partial = 0,
    RequestedCount = 1
}

public virtual int Read(byte[] buffer, int offset, int count, StreamOptions options);
public Task<int> ReadAsync(byte[] buffer, int offset, int count, StreamOptions options);
public virtual Task<int> ReadAsync(byte[] buffer, int offset, int count, StreamOptions options, CancellationToken cancellationToken);

The nice thing about the second one is that we can add more policy if we need to; Stream is abstract and adding new concept is generally somewhat harder. By making it an enum its a bit easier.

Secondly, do we need to do anything for Write? Our current behavior is that Write is essentially WriteAll. Do we need a partial version as well?

@ianhays
Copy link
Contributor

ianhays commented Oct 18, 2016

Stylistically I prefer the second of the two options, but I'm a bit of a sucker for enums. ReadRequestedCount is an awfully verbose name for what is a conceptually simple modification of the existing Read.

Secondly, do we need to do anything for Write? Our current behavior is that Write is essentially WriteAll. Do we need a partial version as well?

I'd argue that our existing Write is already adequate because our default behavior is to write all bytes; if you want a partial write you could just decrease the count you pass to the function. The only way I can see a WritePartial function providing value is if you wanted to break up a Write of X bytes into X/Y smaller writes. That alone doesn't seem like a common enough scenario to warrant additional API (unlike the ReadAll function which I'm all +1 for).

@JeremyKuhne
Copy link
Member

Triage: seems valuable, let's finish addressing the API review feedback.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@JeremyKuhne JeremyKuhne removed the untriaged New issue has not been triaged by the area owner label Mar 3, 2020
@mikernet
Copy link
Contributor

mikernet commented Mar 26, 2020

As per my issue where this issue was just mentioned - please also include Span/Memory based overloads as well :)

@jnm2
Copy link
Contributor

jnm2 commented Mar 26, 2020

Also, there is prior art with the name ReadBlock/ReadBlockAsync on TextReader.

@carlossanlop carlossanlop modified the milestones: 5.0.0, Future Jun 18, 2020
@GSPP
Copy link

GSPP commented Aug 2, 2020

BinaryReader has these methods but they are instance methods. Maybe static methods could be added to BinaryReader.

@geoffkizer
Copy link
Contributor

I think it's more complicated than just "partial" or "all".

What I usually want is a read operation that will guarantee me at least N bytes (or indicate that EOF happened), but also give me additional available data (up to my buffer size of course).

N here is often small -- e.g. for HTTP2 the frame header is 9 bytes; for TLS it's 5. Sometimes I may even have a part of the frame header buffered already, meaning N could be even smaller. But I still want to get as much data in a single Read as I can.

So, something like this:

public int ReadAtLeast(Span<byte> buffer, int bytesToRead);

And similar for async.

If you really want to just fill the buffer completely, pass buffer.Length for bytesToRead.

Some questions to consider:

(1) Virtual or not? I don't see any reason this needs to be virtual.
(2) What happens when EOF is hit before we read enough?

  • Throw?
  • Return # bytes actually read? This could be unexpected; every call site will need to check if the return value is < bytesToRead
  • Return 0, regardless of whether anything was read or not? This seems easier to deal with, but it also hides whether anything was read or not. In practice I'm not sure this matters, because presumably it's an error if you can't get the request bytes (and if not, you probably shouldn't use this API).

@skarllot
Copy link

skarllot commented Mar 2, 2021

I think that should be something like the code below:

public static void ReadAtLeast(this Stream stream, Span<byte> buffer)
{
    ReadAtLeast(stream, buffer, buffer.Length);
}

public static void ReadAtLeast(this Stream stream, Span<byte> buffer, int bytesToRead)
{
    int totalRead = 0;
    while (totalRead < bytesToRead)
    {
        int read = stream.Read(buffer.Slice(totalRead));
        if (read == 0)
            throw new EndOfStreamException();

        totalRead += read;
    }
}

public static async ValueTask ReadAtLeastAsync(this Stream stream, Memory<byte> buffer, CancellationToken cancellationToken = default)
{
    return ReadAtLeastAsync(stream, buffer, buffer.Length, cancellationToken);
}

public static async ValueTask ReadAtLeastAsync(this Stream stream, Memory<byte> buffer, int bytesToRead, CancellationToken cancellationToken = default)
{
    int totalRead = 0;
    while (totalRead < bytesToRead)
    {
        int read = await stream.ReadAsync(buffer.Slice(totalRead), cancellationToken).ConfigureAwait(false);
        if (read == 0)
            throw new EndOfStreamException();

        totalRead += read;
    }
}

@huoyaoyuan
Copy link
Member

Can I require a revisit on this? The ReadAtLeast pattern is very widely used.

@davidfowl
Copy link
Member

ReadAtLeast would align with the method we added to PipeReader in .NET 6 #25063

@eerhardt eerhardt added the api-ready-for-review API is ready for review, it is NOT ready for implementation label May 2, 2022
@eerhardt
Copy link
Member

eerhardt commented May 2, 2022

I've updated the top post to the current proposal and marked this ready-for-review. PTAL. We will try to get this API in for 7.0-preview5.

@terrajobst
Copy link
Member

terrajobst commented May 5, 2022

Video

After lots of discussion we arrived a this:

namespace System.IO;

public partial class Stream
{
    public void ReadExactly(byte[] buffer, int offset, int count);
    public void ReadExactly(Span<byte> buffer);

    public int ReadAtLeast(Span<byte> buffer, int minimumBytes, bool throwOnEndOfStream = true);
    public ValueTask<int> ReadAtLeastAsync(Memory<byte> buffer, int minimumBytes, bool throwOnEndOfStream = true, CancellationToken cancellationToken = default);

    public ValueTask ReadExactlyAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken = default);
    public ValueTask ReadExactlyAsync(Memory<byte> buffer, CancellationToken cancellationToken = default);
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels May 5, 2022
@stephentoub
Copy link
Member

@eerhardt, I wonder if we should consider an analyzer/fixer here as well to go along with the new methods. Flag calls like:

stream.Read(buffer, 0, buffer.Length);

where the return value isn't checked, and recommend it be changed to a call to ReadExactly. Or cases even where the return value is checked but required to be the same as what was requested:

if (stream.Read(buffer, 0, buffer.Length) != buffer.Length) throw ...;

and similarly suggest it use ReadExactly. And flag calls like:

stream.Read(buffer, 0, count);

and recommend it be changed to:

stream.ReadAtLeast(buffer, count);

@davidfowl
Copy link
Member

Is there a reason to add new byte[] overloads and not just prefer Memory<byte>?

@stephentoub
Copy link
Member

There's about an hour of discusson on that in the video.

@davidfowl
Copy link
Member

davidfowl commented May 5, 2022

Ill try to get through it but IMO I'm not seeing the obvious reason for this in a .NET 7+ world. Stream is a monster already... Why are these non virtual?

@eerhardt
Copy link
Member

eerhardt commented May 6, 2022

Why are these non virtual?

There is discussion on that in the video as well. But see also the top post of this issue.

@eerhardt
Copy link
Member

I wonder if we should consider an analyzer/fixer here as well to go along with the new methods.

@stephentoub - I opened #69159 to suggest this analyzer. We can discuss it there. However, I wonder if addressing #34098 would be more beneficial than that analyzer.

@DrkWzrd
Copy link

DrkWzrd commented May 11, 2022

Will there be a way to know how many bytes have been read if ReadExactly throws?

If not, maybe the way to go is to create a new

public class EndOfStreamException2 : EndOfStreamException
{
    public int BytesRead { get; init; }
}

And make the user catch this specific type.

The other non unpleasant option is the position of the stream being reset (and this is so unconvenient and impossible in some streams).
All the other options will cause an uncertain byte loss. Isn't it?

@stephentoub
Copy link
Member

Will there be a way to know how many bytes have been read if ReadExactly throws?

If you care about the number of bytes read in that case, you wouldn't use ReadExactly. Instead you'd use ReadAtLeast, e.g.

int numRead = stream.ReadAtLeast(buffer, buffer.Length, false);
if (numRead < buffer.Length)
{
    HandleEof(numRead);
}

@DrkWzrd
Copy link

DrkWzrd commented May 11, 2022

Gotcha! Thanks @stephentoub .

@adamsitnik
Copy link
Member

Is there a reason to add new byte[] overloads and not just prefer Memory

There's about an hour of discusson on that in the video.

Now I regret not being a part of this discussion ;)

@eerhardt
Copy link
Member

Now I regret not being a part of this discussion ;)

It's not too late to provide your feedback. We have until 7.0.0 releases to make changes to this proposal.

@weltkante
Copy link
Contributor

I wonder if we should consider an analyzer/fixer here as well to go along with the new methods.

@stephentoub - I opened #69159 to suggest this analyzer. We can discuss it there. However, I wonder if addressing #34098 would be more beneficial than that analyzer.

I don't think these methods can be considered pure? They change the state of the stream (at the very least the current position, potentially more in custom stream classes) so you couldn't call them repeatedly with the same outcome.

@stephentoub
Copy link
Member

I don't think these methods can be considered pure?

That issue isn't about purity any more. It's about annotating methods for which the return value shouldn't be ignored.
#34098 (comment)

@eerhardt
Copy link
Member

I don't think these methods can be considered pure?

The title of #34098 should probably be updated to not reference "Pure" and the top post should be updated for the latest proposal. That issue has transitioned from "pure" to "do not ignore return value". See #34098 (comment).

eerhardt added a commit to eerhardt/runtime that referenced this issue May 12, 2022
Adds methods to Stream to read at least a minimum amount of bytes, or a full buffer, of data from the stream.
ReadAtLeast allows for the caller to specify whether an exception should be thrown or not on the end of the stream.

Make use of the new methods where appropriate in net7.0.

Fix dotnet#16598
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label May 12, 2022
@davidfowl
Copy link
Member

Now I wonder if we should add ReadExactlyAsync to PipeReader 😄

eerhardt added a commit that referenced this issue May 20, 2022
* Add Stream ReadAtLeast and ReadExactly

Adds methods to Stream to read at least a minimum amount of bytes, or a full buffer, of data from the stream.
ReadAtLeast allows for the caller to specify whether an exception should be thrown or not on the end of the stream.

Make use of the new methods where appropriate in net7.0.

Fix #16598

* Add ReadAtLeast and ReadExactly unit tests

* Add XML docs to the new APIs

* Preserve behavior in StreamReader.FillBuffer when passed 0.

* Handle ReadExactly with an empty buffer, and ReadAtLeast with minimumBytes == 0.

Both of these cases are a no-op. No exception is thrown. A read won't be issued to the underlying stream. They won't block until data is available.
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 20, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jun 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.IO blocking Marks issues that we want to fast track in order to unblock other important work
Projects
None yet
Development

Successfully merging a pull request may close this issue.