-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent Error Handling for Graceful Server Shutdown during Keepalive Ping #2624
Comments
This ping cancellation you are observing is a session-level event, and a description of how the ping specifically ended. It is not supposed to have any direct correspondence to how an individual request on that session ends. We interpret any ping error we see to mean that the server is no longer available, so the UNAVAILABLE error code is appropriate. When you say that you "Gracefully stop the server", it's not clear what that means. In the context of gRPC, that would generally involve allowing existing requests to end on their own while not accepting new requests, but that is clearly not what is happening here. If you want to have more control over the status code the client sees, you should control that explicitly on the server. Most gRPC server implementations have a function that cancels all open requests (generally while shutting down). In grpc-js, that is |
@murgatroid99 Thanks for the explanation! I'm running into a strange situation where I get "Unavailable" when connecting to my server in CI and "Cancelled" in the 'Latest' server environment.
By graceful shutdown, I mean stopping the docker-compose (my server) using Ctrl + C. However, previous versions (<=1.9.1) of gRPC-Js before this modification in transport.ts was added, I was getting the same error message (Cancelled) across both environments (CI and Latest). I would tend to agree that 'Unavailable' should be the right message. However, I still cannot figure out why I get different error messages for different server environments after upgrading gRPC-js. |
Well, what is different between your CI and Latest server environments? |
@murgatroid99 They are the same server but running in different environment. |
The exact error you are seeing is directly related to how you are running the server (in a docker container) and how you are stopping it (sending SIGINT to docker-compose). If you are not doing those same things in both environments, it is unsurprising that you would see different errors, and even if you are, identical handling is not guaranteed on the client. The expectation in gRPC is that a server gracefully shuts down by closing the listening port, sending each open session a GOAWAY with code 0, and then waiting for existing requests to end on their own before ending the process. These actions are generally encapsulated in a single operation in the gRPC API, like A client will make its best effort to handle any other event that causes a connection to become unusable in the middle of processing a request, but specific outcomes are not guaranteed. |
In my case, receiving a |
In your log, |
I ran the test in another scenario to better illustrate the problem:
In this test, I set the In server version 23.10.0: At t=10s, since the session had already closed, it returned CANCELLED to the client because the keepalive ping could not be sent. In server version 24.0.0: The client managed to send the keepalive ping to the server before the server closed, and it returned UNAVAILABLE (by the ping error handler, i.e handleDisconnect) to the client. I realized that the difference between the two servers is that in version 24.0.0, they upgraded to .NET 7, but nothing else changed. The changelog for .NET 7 can be found here. ConclusionFor both server, the server was shut down at the same time (t=1s) and different error message is being returned because In version 23.10.0, the session closed prematurely at t=6s, making it impossible for the client to send the keepalive ping, resulting in a CANCELLED error. In contrast, in version 24.0.0, with the .NET 7 upgrade, the session remained open until t=10s. This allowed the client enough time to send the keepalive ping before the server shutdown, leading to an UNAVAILABLE error instead. So, it appears that the discrepancy might have been introduced by the .NET 7 upgrade after all, but nothing obvious is indicated in the changelog. |
The Node client does not currently end a call with the CANCELLED status because the keepalive ping could not be sent. The PR you sent would have introduced that behavior, but I didn't merge that PR. Either the CANCELLED status is coming from a different code path, or you are using your branch with your proposed change. |
It seems that in .NET 7, the default value for the graceful ShutdownTimeout has changed, which affects how much time the server allows for ongoing requests to complete gracefully before forcefully terminating them during a shutdown process but it wasn't listed in the changelog. I am closing this issue. |
I was wondering if this is the same issue that i am facing. I want to use tryShutdown() to gracefully process inflight requests or pending request that came in before SIGTERM or SIGINT and still processing but when the client side receives, it gets an error saying "Connection dropped" code 14. If it was just the server itself that we are concerned about than clent would anyhow wont be able to process anyhing. I was wondering if we could create a functionality where we would not kill the connection abrupty but give a chance to the server to keep the connection alive for a while after the process has been done. Like how hapi does it https://hapi.dev/api/?v=21.3.3#-await-serverstopoptions where it provides some timeout where we can process it within that time frame, else it will froce shutdown. Then we give the client some power to process the requests as well. like /// callback called after that 10000 ms pause after the pending request processed. This way we will only close the connection in 10seconds after the pending requests have been processed. |
@manish-sharma-resolver To be clear, the functionality you want is how the function is supposed to work. I don't recognize that particular error in this situation, so it might make sense to file a separate issue about it. |
Problem Description
I am experiencing a discrepancy in the error messages returned when reading from a gRPC stream and gracefully shutting down the connection midstream. Specifically, the
grpc-js
client incorrectly interprets a cancelled keepalive ping as an "Unavailable" error instead of the expected "Cancelled" error during such a shutdown. This behavior is related to my PR #2622. While the client logsError [ERR_HTTP2_PING_CANCEL]: HTTP2 ping cancelled
, it eventually throws an "Unavailable" error, which seems inconsistent with the actual server state.Reproduction Steps
yarn
to install dependencies../server.sh
to start the server.node client.js
to start the client.Relevant code snippet handling ping failures:
transport.ts:429
The log displays
Error [ERR_HTTP2_PING_CANCEL]: HTTP2 ping cancelled
, indicating a correct cancellation due to server shutdown. However, the final error reported to the client is "Unavailable".Full log:
Environment
@grpc/[email protected]
Additional Context
N/A
The text was updated successfully, but these errors were encountered: