-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cancellation of Socket.ConnectAsync intermitently hangs #42198
Comments
Tagging subscribers to this area: @dotnet/ncl |
In AttemptConnection, when ConnectAsync completes synchronously, we are calling InternalConnectCallback directly. This is bad since the caller of AttemptConnection should be holding the lock here. We should instead return to the caller and do the call to InternalConnectCallback outside the lock. Additionally, we should add asserts wherever we take the lock, but especially in the callbacks like InternalConnectCallback, that the lock is not already held by this thread -- basically, And if a method is assuming the caller is holding the lock, like AttemptConnection is, we should add an assert that the lock is actually held. This is valuable not just to detect bugs, but also to document the expectation of the function. |
Fixes: stress client double read of content fixed fixed stress client hangs at start and stop leveraged HttpVersionPolicy increased pipeline timeout since we doubled the runs fixed base docker images to avoid missing IO.Pipelines Kestrel exception. Re-hauled tracing: added server file logging added log file rotation Minor renames. Contributes to: #42211 and #42198
Is there a workaround? I have the same problem and i don`t know how to solve it from outside of the Socket class |
The only workaround is to avoid cancellation during socket connect. I don't know your exact problem so I cannot help much further at this point. |
When
Socket.ConnectAsync
is cancelled it might lead to a deadlock.The cancellation code for
MultipleConnectAsync
tries to acquire a lock which is held by theconnectAsync
continuation (triggered byDoDnsCallback
). The continuation then tries to dispose of the cancellation registration which will wait for all the callbacks if they've already been triggered.This is easily reproducible with our HTTP stress suite (for HTTP 1.1 runs longer than 30 mins).
The biggest impact of this issue is that it prevents us from getting reasonable results from HTTP 1.1 stress pipeline on master. The pipeline gets cancelled on timeout and no results are available (AzDO empty logs).
The text was updated successfully, but these errors were encountered: