Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐛 Bug]: Selenium driver (C#) doesn't clean up session if timing out when creating session? #14743

Closed
genne opened this issue Nov 11, 2024 · 13 comments · Fixed by #14756
Closed

Comments

@genne
Copy link

genne commented Nov 11, 2024

What happened?

I often encounter timeout errors when attempting to create sessions:

OpenQA.Selenium.WebDriverException: The HTTP request to the remote WebDriver server for URL http://4.207.73.132:4444/wd/hub/session timed out after 60 seconds. 
  ---> System.Threading.Tasks.TaskCanceledException: The request was canceled due to the configured HttpClient.Timeout of 60 seconds elapsing. 
  ---> System.TimeoutException: The operation was canceled. 
  ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled. 
  ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled. 
  ---> System.Net.Sockets.SocketException (125): Operation canceled 
    --- End of inner exception stack trace --- 
    at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) 
    at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource<System.Int32>.GetResult(Int16 token) 
    at System.Net.Http.HttpConnection.InitialFillAsync(Boolean async) 
    at System.Net.Http.HttpConnection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) 
    --- End of inner exception stack trace --- 
    at System.Net.Http.HttpConnection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) 
    at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken) 
    at System.Net.Http.DiagnosticsHandler.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) 
    at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) 
    at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken) 
    --- End of inner exception stack trace --- 
    --- End of inner exception stack trace --- 
    at System.Net.Http.HttpClient.HandleFailure(Exception e, Boolean telemetryStarted, HttpResponseMessage response, CancellationTokenSource cts, CancellationToken cancellationToken, CancellationTokenSource pendingRequestsCts) 
    at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken) 
    at OpenQA.Selenium.Remote.HttpCommandExecutor.MakeHttpRequest(HttpRequestInfo requestInfo) 
    at OpenQA.Selenium.Remote.HttpCommandExecutor.ExecuteAsync(Command commandToExecute) 
    --- End of inner exception stack trace --- 
    at OpenQA.Selenium.Remote.HttpCommandExecutor.ExecuteAsync(Command commandToExecute) 
    at OpenQA.Selenium.WebDriver.ExecuteAsync(String driverCommandToExecute, Dictionary`2 parameters) 
    at OpenQA.Selenium.WebDriver.StartSession(ICapabilities capabilities) 
    at OpenQA.Selenium.WebDriver..ctor(ICommandExecutor executor, ICapabilities capabilities) 

At the same time, I notice the grid queue builds up, and the sessions seem to run for a couple of minutes before being closed:

image

From analyzing the code, it appears that StartSession doesn’t clean up properly when it fails:

Response response = this.Execute(DriverCommand.NewSession, parameters);

If this throws an exception, the sessionId is never set:

this.sessionId = new SessionId(response.SessionId);

As a result, Dispose does nothing:

if (this.sessionId != null)
    this.Execute(DriverCommand.Quit, (Dictionary<string, object>) null);

The session is still created, but since the session ID is never returned to the client, it remains stuck until it’s automatically terminated after a few minutes. This also causes subsequent sessions to time out as they wait for the hung session, which further compounds the issue by adding more stuck sessions to the queue.

How can we reproduce the issue?

#
    
    [Test]
    public void Timeout()
    {
        var seleniumRemoteUrl = "http://localhost:4444/wd/hub";

        {
            // Create new session
            using var existingSession = new RemoteWebDriver(new(seleniumRemoteUrl), new ChromeOptions());
            
            // Second session will timeout as the first session is still running
            Assert.Throws<WebDriverException>(() =>
                new RemoteWebDriver(new(seleniumRemoteUrl), new ChromeOptions())
            );
            
            // Scope ends, session is closed
        }

        // This now fails as the second session wasn't closed properly
        Assert.DoesNotThrow(() =>
        {
            using var newSession = new RemoteWebDriver(new(seleniumRemoteUrl), new ChromeOptions());
        });
    }

Relevant log output

Expected: No Exception to be thrown
  But was:  <OpenQA.Selenium.WebDriverException: The HTTP request to the remote WebDriver server for URL http://localhost:4444/wd/hub/session timed out after 60 seconds.
 ---> System.Threading.Tasks.TaskCanceledException: The request was canceled due to the configured HttpClient.Timeout of 60 seconds elapsing.
 ---> System.TimeoutException: The operation was canceled.
 ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
 ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled.
 ---> System.Net.Sockets.SocketException (89): Operation canceled
   --- End of inner exception stack trace ---
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource<System.Int32>.GetResult(Int16 token)
   at System.Net.Http.HttpConnection.InitialFillAsync(Boolean async)
   at System.Net.Http.HttpConnection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpConnection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   --- End of inner exception stack trace ---
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpClient.HandleFailure(Exception e, Boolean telemetryStarted, HttpResponseMessage response, CancellationTokenSource cts, CancellationToken cancellationToken, CancellationTokenSource pendingRequestsCts)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   at OpenQA.Selenium.Remote.HttpCommandExecutor.MakeHttpRequest(HttpRequestInfo requestInfo)
   at OpenQA.Selenium.Remote.HttpCommandExecutor.ExecuteAsync(Command commandToExecute)
   --- End of inner exception stack trace ---
   at OpenQA.Selenium.Remote.HttpCommandExecutor.ExecuteAsync(Command commandToExecute)
   at OpenQA.Selenium.WebDriver.ExecuteAsync(String driverCommandToExecute, Dictionary`2 parameters)
   at OpenQA.Selenium.WebDriver.Execute(String driverCommandToExecute, Dictionary`2 parameters)
   at OpenQA.Selenium.WebDriver.StartSession(ICapabilities capabilities)
   at OpenQA.Selenium.WebDriver..ctor(ICommandExecutor executor, ICapabilities capabilities)
   at OpenQA.Selenium.Remote.RemoteWebDriver..ctor(ICommandExecutor commandExecutor, ICapabilities capabilities)
   at OpenQA.Selenium.Remote.RemoteWebDriver..ctor(Uri remoteAddress, ICapabilities capabilities, TimeSpan commandTimeout)
   at OpenQA.Selenium.Remote.RemoteWebDriver..ctor(Uri remoteAddress, ICapabilities capabilities)
   at OpenQA.Selenium.Remote.RemoteWebDriver..ctor(Uri remoteAddress, DriverOptions options)

Operating System

Both macOS 14.6.1 and Linux

Selenium version

C# 4.24.0

What are the browser(s) and version(s) where you see this issue?

Chrome

What are the browser driver(s) and version(s) where you see this issue?

Selenium.WebDriver.ChromeDriver 130.0.6723.11600

Are you using Selenium Grid?

4.26.0 (revision 69f9e5e)

Copy link

@genne, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

@nvborisenko
Copy link
Member

It is impossible to dispose a session which doesn't exists. Why the session could not be created in time... this should be resolved. @genne any ideas?

@genne
Copy link
Author

genne commented Nov 12, 2024

It is impossible to dispose a session which doesn't exists. Why the session could not be created in time... this should be resolved. @genne any ideas?

@nvborisenko perhaps instead of relying on the session id returned by the endpoint, you include a request id (GUID) when creating the session? Then you can use that to kill the session?

var requestId = Guid.New();
parameters["RequestId"] = requestId;
try
{
    Response response = this.Execute(DriverCommand.NewSession, parameters);
}
catch
{
    var quitParameters = new()...;
    quitParameters["RequestId"] = requestId;
    this.Execute(DriverCommand.Quit, quitParameters);
}

@nvborisenko
Copy link
Member

@diemol from one point of you it works as expected. It is normal just to close this issue.

But from other point of view the issue looks like an issue and, seems, it is valid.

@joerg1985
Copy link
Member

@nvborisenko This might be an issue of unaligned timeouts?
The client timeout should be bigger than the --session-timeout used by the grid to avoid this.
The default client timeout of the #c client is 60s and default session-timeout of the grid is 300s.

It is in general a good idea to have a client timeout >300s, see #12368 (comment)

@nvborisenko
Copy link
Member

We can increase default timeout, but it is not a solution :( The issue here is about how to determine that client is offline, and server can clean up all resources potentially allocated by the client.

@joerg1985
Copy link
Member

I have recently added canceling the upstream request in 7175349 so the grid should be able to stop the new session request without a request id.

@nvborisenko
Copy link
Member

So it should be implicitly fixed in the next release of Grid?

@joerg1985
Copy link
Member

Maybe, it depends on how the webdriver does handle the client closing the socket before the browser has been started.

In case the webdriver does not propper handle this, the grid could handle it.

@joerg1985
Copy link
Member

I have created DistributedTest#clientTimeoutDoesNotLeakARunningBrowser and added handling this case to the grid.
As soon as #14756 is merged this issue is fixed.

@genne
Copy link
Author

genne commented Nov 18, 2024

@joerg1985 @nvborisenko thanks for the quick fix ⭐

@genne
Copy link
Author

genne commented Nov 25, 2024

Hi @joerg1985 @nvborisenko the latest version doesn't work for me anymore, I get a timeout error every time I try to connect:
image
Could be related to this change?

@genne
Copy link
Author

genne commented Nov 25, 2024

Hi @joerg1985 @nvborisenko the latest version doesn't work for me anymore, I get a timeout error every time I try to connect: image Could be related to this change?

Just restarting the container seems to have fixed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants