Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streams: Prevent RST_STREAM from being sent multiple times #1267

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

MadMockers
Copy link

Previously, RST_STREAM would be sent even when the stream closed by
state was in SEND_RST_STREAM. This fix instead checks which peer
closed the stream initially, and then updates the closed by value
from RECV_RST_STREAM to SEND_RST_STREAM after a RST_STREAM frame
has been sent on a previously reset stream.

Streams that have a closed by value of SEND_RST_STREAM now ignore
all frames.

@MadMockers MadMockers force-pushed the fix/multiple-rst-stream-frames branch 2 times, most recently from 19e109c to ec38cbb Compare March 25, 2022 05:09
@Kriechi
Copy link
Member

Kriechi commented Mar 26, 2022

@MadMockers thanks for sending this fix!
I haven't seen any bug reports on this before, did you find any errors or incompatibilities with specific clients, or did you follow the RFC word by word? I'm curious to see how this change affects the existing tests - and if we need any new ones to cover the new behaviour?

@MadMockers
Copy link
Author

MadMockers commented Mar 27, 2022

@MadMockers thanks for sending this fix! I haven't seen any bug reports on this before, did you find any errors or incompatibilities with specific clients

Hey mate - Yes I had issues with dotnet gRPC, which uses the Kestral HTTP/2 implementation. There's an additional bug in the Kestral implementation that causes a connection error when a second RST_STREAM is received (The correct action for dotnet is to ignore the frame). The bug in Kestral was resolved here (dotnet/aspnetcore@1db20af), however it remains in dotnet 3.1 (an LTS release).

I've seen connection errors when using the python grpclib implementation, which uses h2, to talk with the dotnet gRPC server. This issue (coupled with the additional dotnet issue) causes all active gRPC requests to be aborted due to dotnet tearing down the underlying connection. Unfortunately I've found it hard to reproduce this issue with logging as it's racy, however I've manually just updated the file in my venv and am no longer observing it.

I'm curious to see how this change affects the existing tests - and if we need any new ones to cover the new behaviour?

I haven't run any of the tests yet, sorry! An additional test would be to have a peer always respond to a RST_STREAM with a RST_STREAM. The correct action is that the response is ignored. The RFC allows for this, as 2 RST_STREAM frames in different directions may be in flight at the same time.

I haven't looked at the tests yet, but intend to add one.

@MadMockers MadMockers force-pushed the fix/multiple-rst-stream-frames branch 2 times, most recently from 397c071 to 98a0759 Compare March 27, 2022 01:34
@MadMockers
Copy link
Author

A test for this scenario does already exist, I've added an additional commit which asserts correct behaviour.

Previously, RST_STREAM would be sent even when the stream closed by
state was in SEND_RST_STREAM. This fix instead checks which peer
closed the stream initially, and then updates the closed by value
from RECV_RST_STREAM to SEND_RST_STREAM after a RST_STREAM frame
has been sent on a previously reset stream.

Streams that have a closed by value of SEND_RST_STREAM now ignore
all frames.
@MadMockers MadMockers force-pushed the fix/multiple-rst-stream-frames branch from 98a0759 to a1991c7 Compare March 28, 2022 08:32
@MadMockers
Copy link
Author

Updated with tests and additional coverage (please see CI results here MadMockers#5)

I should note that in the process of doing this I did come across a contradiction in the RFC that may be worth discussing.

From Section 5.1 (closed state):

An endpoint MUST NOT send frames other than PRIORITY on a closed
stream. An endpoint that receives any frame other than PRIORITY
after receiving a RST_STREAM MUST treat that as a stream error
(Section 5.4.2) of type STREAM_CLOSED.

The referenced Section 5.4.2:

An endpoint that detects a stream error sends a RST_STREAM frame
(Section 6.4) that contains the stream identifier of the stream where
the error occurred. The RST_STREAM frame includes an error code that
indicates the type of error.

Essentially we're being told we MUST NOT send any frames, but we MUST treat this as a stream error (which involves sending a frame).

I think the common sense interpretation would be that perhaps the RFC should have stated that an endpoint can send both PRIORITY AND RST_STREAM in the closed state. Or maybe the RFC does not consider RST_STREAM frames to be "on" a stream. Either way, I bring this up as I relied on it for the logic for not sending more than one RST_STREAM frame (first one transitions to closed state, making additional frames illegal).

There's other parts however which re-enforce that a RST_STREAM frame should only be sent once.

Again from Section 5.1 (closed state):

An endpoint MUST ignore frames that it
receives on closed streams after it has sent a RST_STREAM frame.

Once a RST_STREAM has been sent, it would only be generated again from receiving an additional frame. If additional frames are being ignored, then there should only be the initial RST_STREAM frame.

However there is a contradiction, again in Section 5.4.2:

Normally, an endpoint SHOULD NOT send more than one RST_STREAM frame
for any stream. However, an endpoint MAY send additional RST_STREAM
frames if it receives frames on a closed stream after more than a
round-trip time. This behavior is permitted to deal with misbehaving
implementations.

Taken in isolation, the SHOULD NOT portion would imply that the old operation prior to this fix is not invalid (SHOULD NOT != MUST NOT). At a minimum this patch removes behaviour defined as SHOULD NOT. Additionally, as h2 doesn't do round-trip time tracking (that I'm aware of - may have missed this!!!), I think this clause should be taken as a MUST NOT when not doing the time tracking.

@MadMockers MadMockers marked this pull request as ready for review March 28, 2022 09:06
@Lukasa
Copy link
Member

Lukasa commented Mar 29, 2022

I think the common sense interpretation would be that perhaps the RFC should have stated that an endpoint can send both PRIORITY AND RST_STREAM in the closed state. Or maybe the RFC does not consider RST_STREAM frames to be "on" a stream. Either way, I bring this up as I relied on it for the logic for not sending more than one RST_STREAM frame (first one transitions to closed state, making additional frames illegal).

This contradiction has been resolved in the new version of the document.

The spec deliberately allows sending multiple RST_STREAM frames to account for the possibility that the peer implementation is buggy: if it is still sending frames on a stream after one RTT from receiving an RST_STREAM frame then the peer implementation is clearly confused and has mishandled the frame. However, the new guidance (that this is a connection error, not a stream error) is probably the best mode of handling this.

@MadMockers
Copy link
Author

MadMockers commented Mar 30, 2022

This contradiction has been resolved in the new version of the document.

Nice! Just looking through the updated version, it seems the main contradiction is resolved. I'm not sure now when a second RST_STREAM would ever be sent when strictly following the 5.1 state section. Previously, it was allowed to be sent due to:

An endpoint that receives any frame other than PRIORITY
after receiving a RST_STREAM MUST treat that as a stream error
(Section 5.4.2) of type STREAM_CLOSED.

As this portion has now been removed, there doesn't seem to be any state in which a second RST_STREAM is called for.

My reading of the updated closed state is that an endpoint can either ignore frames, or treat it as a connection error. Notably, the updated RFC still contains the following:

An endpoint MUST NOT send frames other than PRIORITY on a closed stream.


The spec deliberately allows sending multiple RST_STREAM frames to account for the possibility that the peer implementation is buggy: if it is still sending frames on a stream after one RTT from receiving an RST_STREAM frame then the peer implementation is clearly confused and has mishandled the frame.

Section 5.4.2. Stream Error Handling still seems to be in contradiction with the updated closed state. Is it possible the following clause was unintentionally left in the updated version?

Normally, an endpoint SHOULD NOT send more than one RST_STREAM frame for any stream. However, an endpoint MAY send additional RST_STREAM frames if it receives frames on a closed stream after more than a round-trip time. This behavior is permitted to deal with misbehaving implementations.


However, the new guidance (that this is a connection error, not a stream error) is probably the best mode of handling this.

This would resolve the issue I'm seeing with the Kestral HTTP2 implementation. That being said, there may be unintended compatibility consequences with other buggy peers. Previously, h2 would treat this as a stream error (sending an additional RST_STREAM), which should be ignored by the peer. Treating it as a connection error instead of ignoring the frame could noticeably change interoperability.

It may be worth looking at implementing the following suggestion from the updated RFC:

Endpoints can use frames that indicate that the peer has received the closing signal to drive this. Endpoints SHOULD NOT use timers for this purpose. For example, an endpoint that sends a SETTINGS frame after closing a stream can safely treat receipt of a DATA frame on that stream as an error after receiving an acknowledgment of the settings. Other things that might be used are PING frames, receiving data on streams that were created after closing the stream, or responses to requests created after closing the stream.

An additional thought which may be deemed out-of-scope is potentially a "strict" mode or a "quirks" mode (understanding that there is legitimate argument against additional code paths to account for these modes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants