Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[client-v2] Connection Disruption Issues (Timeout, Reset by Peer, No Response) #1994

Closed
chernser opened this issue Dec 3, 2024 · 1 comment · Fixed by #2104
Closed

[client-v2] Connection Disruption Issues (Timeout, Reset by Peer, No Response) #1994

chernser opened this issue Dec 3, 2024 · 1 comment · Fixed by #2104
Assignees
Labels
bug client-api-v2 network network and IO related issues

Comments

@chernser
Copy link
Contributor

chernser commented Dec 3, 2024

Describe the bug

There is set of issues while different stages of request. This issue may be not covers them all, but aimed to investigate and understand these issues.

Unknown

No response tests causes broken pipe exception when run locally:

[ERROR] Tests run: 23, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 67.20 s <<< FAILURE! -- in TestSuite
[ERROR] com.clickhouse.client.http.ApacheHttpConnectionImplTest.testNoHttpResponseExceptionWithValidation[100](2) -- Time elapsed: 0 s <<< FAILURE!
java.lang.AssertionError: Unexpected exception
        at org.testng.Assert.fail(Assert.java:98)
        at com.clickhouse.client.http.ApacheHttpConnectionImplTest.testNoHttpResponseExceptionWithValidation(ApacheHttpConnectionImplTest.java:268)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
      ....
  at org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:137)
        at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:385)
        at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:162)
        at org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:507)
        at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:495)
Caused by: com.clickhouse.client.ClickHouseException: Broken pipe
        at com.clickhouse.client.ClickHouseException.of(ClickHouseException.java:149)
        at com.clickhouse.client.AbstractClient.lambda$execute$0(AbstractClient.java:275)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.net.ConnectException: Broken pipe
        at com.clickhouse.client.http.ApacheHttpConnectionImpl.post(ApacheHttpConnectionImpl.java:307)
        at com.clickhouse.client.http.ClickHouseHttpClient.send(ClickHouseHttpClient.java:195)
        at com.clickhouse.client.AbstractClient.sendAsync(AbstractClient.java:161)
        at com.clickhouse.client.AbstractClient.lambda$execute$0(AbstractClient.java:273)
        ... 4 more

Connection Request

Connection pool may be source of timeouts, too. There is connection request timeout that limits for how long client would wait for connection from a pool. There is connection establish timeout that is for time spent to create TCP connection after connection is leased from the connection pool.

Socket timeout

There is a socket timeout that affects write/read operations. It seems not set by default or set implicitly. This should be investigated. Socket timeout may be affected by request timeout if Apache HTTP Client is used.
Socket timeout is measured by the wall clock so it is affected by GC pause when application is under significant load. We need to document this to make users aware of the problem. Another question - can we detect it? May be if we have two timers for wall clock and ticker counter.

Fixed:

Example stack trace:
Pay attention that test elapsed time is 3.772 s. This means that somewhere very short timeout was set like around 1s
For tests it may be a common problem when run against the cloud.

Error:  com.clickhouse.client.insert.InsertClientContentCompressionTests.testLogComment[](5) -- Time elapsed: 3.772 s <<< FAILURE!
com.clickhouse.client.api.ClientException: Failed to execute request
	at com.clickhouse.client.api.internal.HttpAPIClientHelper.executeRequest(HttpAPIClientHelper.java:404)
	at com.clickhouse.client.api.Client.lambda$query$11(Client.java:1706)
	at com.clickhouse.client.api.Client.runAsyncOperation(Client.java:2116)
	at com.clickhouse.client.api.Client.query(Client.java:1782)
	at com.clickhouse.client.api.Client.query(Client.java:1632)
	at com.clickhouse.client.api.Client.execute(Client.java:2036)
	at com.clickhouse.client.insert.InsertTests.testLogComment(InsertTests.java:369)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:136)
	at org.testng.internal.invokers.TestInvoker.invokeMethod(TestInvoker.java:658)
	at org.testng.internal.invokers.TestInvoker.invokeTestMethod(TestInvoker.java:219)
	at org.testng.internal.invokers.MethodRunner.runInSequence(MethodRunner.java:50)
	at org.testng.internal.invokers.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:923)
	at org.testng.internal.invokers.TestInvoker.invokeTestMethods(TestInvoker.java:192)
	at org.testng.internal.invokers.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
	at org.testng.internal.invokers.TestMethodWorker.run(TestMethodWorker.java:128)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
	at org.testng.TestRunner.privateRun(TestRunner.java:808)
	at org.testng.TestRunner.run(TestRunner.java:603)
	at org.testng.SuiteRunner.runTest(SuiteRunner.java:429)
	at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:423)
	at org.testng.SuiteRunner.privateRun(SuiteRunner.java:383)
	at org.testng.SuiteRunner.run(SuiteRunner.java:326)
	at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
	at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:95)
	at org.testng.TestNG.runSuitesSequentially(TestNG.java:1249)
	at org.testng.TestNG.runSuitesLocally(TestNG.java:1169)
	at org.testng.TestNG.runSuites(TestNG.java:1092)
	at org.testng.TestNG.run(TestNG.java:1060)
	at org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:155)
	at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeMulti(TestNGDirectoryTestSuite.java:169)
	at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.execute(TestNGDirectoryTestSuite.java:88)
	at org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:137)
	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:385)
	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:162)
	at org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:507)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:495)
Caused by: java.net.SocketTimeoutException: Read timed out
	at java.base/sun.nio.ch.NioSocketImpl.timedRead(NioSocketImpl.java:278)
	at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:304)
	at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:346)
	at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:796)
	at java.base/java.net.Socket$SocketInputStream.read(Socket.java:1099)
	at java.base/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:489)
	at java.base/sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:483)
	at java.base/sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:70)
	at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1461)
	at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1066)
	at org.apache.hc.core5.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:149)
	at org.apache.hc.core5.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
	at org.apache.hc.core5.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:247)
	at org.apache.hc.core5.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:54)
	at org.apache.hc.core5.http.impl.io.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:299)
	at org.apache.hc.core5.http.impl.io.HttpRequestExecutor.execute(HttpRequestExecutor.java:175)
	at org.apache.hc.core5.http.impl.io.HttpRequestExecutor.execute(HttpRequestExecutor.java:218)
	at org.apache.hc.client5.http.impl.io.PoolingHttpClientConnectionManager$InternalConnectionEndpoint.execute(PoolingHttpClientConnectionManager.java:717)
	at org.apache.hc.client5.http.impl.classic.InternalExecRuntime.execute(InternalExecRuntime.java:216)
	at org.apache.hc.client5.http.impl.classic.MainClientExec.execute(MainClientExec.java:116)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
	at org.apache.hc.client5.http.impl.classic.ConnectExec.execute(ConnectExec.java:188)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
	at org.apache.hc.client5.http.impl.classic.ProtocolExec.execute(ProtocolExec.java:192)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
	at org.apache.hc.client5.http.impl.classic.HttpRequestRetryExec.execute(HttpRequestRetryExec.java:113)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
	at org.apache.hc.client5.http.impl.classic.ContentCompressionExec.execute(ContentCompressionExec.java:152)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
	at org.apache.hc.client5.http.impl.classic.RedirectExec.execute(RedirectExec.java:116)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
	at org.apache.hc.client5.http.impl.classic.InternalHttpClient.doExecute(InternalHttpClient.java:170)
	at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:87)
	at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:55)
	at org.apache.hc.client5.http.classic.HttpClient.executeOpen(HttpClient.java:183)
	at com.clickhouse.client.api.internal.HttpAPIClientHelper.executeRequest(HttpAPIClientHelper.java:377)
	... 37 more

Connection Reset by Peer

Most probably timeout happens on a server side. Client may see it as a connection reset because server close socket without sending any error.

There is one of examples: #2070
We need to test long running queries.

Here is native code that causes exception https://github.com/openjdk/jdk17/blob/master/src/java.base/unix/native/libnio/ch/SocketDispatcher.c
It can be:

  • SocketException: Connection reset by peer
  • ConnectionException: Connection reset

Reset exceptions are reaction to a TCP reset package, but read and write operation has different execution branches so exceptions may vary.

Example stack trace:

Caused by: java.net.SocketException: Connection reset by peer
	at java.base/sun.nio.ch.NioSocketImpl.implWrite(NioSocketImpl.java:425)
	at java.base/sun.nio.ch.NioSocketImpl.write(NioSocketImpl.java:445)
	at java.base/sun.nio.ch.NioSocketImpl$2.write(NioSocketImpl.java:831)
	at java.base/java.net.Socket$SocketOutputStream.write(Socket.java:1035)
	at java.base/sun.security.ssl.SSLSocketOutputRecord.deliver(SSLSocketOutputRecord.java:345)
	at java.base/sun.security.ssl.SSLSocketImpl$AppOutputStream.write(SSLSocketImpl.java:1308)
	at org.apache.hc.core5.http.impl.io.DefaultBHttpClientConnection$1.write(DefaultBHttpClientConnection.java:232)
	at org.apache.hc.core5.http.impl.io.SessionOutputBufferImpl.flushBuffer(SessionOutputBufferImpl.java:117)
	at org.apache.hc.core5.http.impl.io.SessionOutputBufferImpl.write(SessionOutputBufferImpl.java:150)
	at org.apache.hc.core5.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:143)
	Suppressed: java.net.SocketException: Broken pipe
		at java.base/sun.nio.ch.NioSocketImpl.implWrite(NioSocketImpl.java:425)
		at java.base/sun.nio.ch.NioSocketImpl.write(NioSocketImpl.java:445)
		at java.base/sun.nio.ch.NioSocketImpl$2.write(NioSocketImpl.java:831)
		at java.base/java.net.Socket$SocketOutputStream.write(Socket.java:1035)
		at java.base/sun.security.ssl.SSLSocketOutputRecord.deliver(SSLSocketOutputRecord.java:345)
		at java.base/sun.security.ssl.SSLSocketImpl$AppOutputStream.write(SSLSocketImpl.java:1308)
		at org.apache.hc.core5.http.impl.io.DefaultBHttpClientConnection$1.write(DefaultBHttpClientConnection.java:232)
		at org.apache.hc.core5.http.impl.io.SessionOutputBufferImpl.flushBuffer(SessionOutputBufferImpl.java:117)
		at org.apache.hc.core5.http.impl.io.SessionOutputBufferImpl.write(SessionOutputBufferImpl.java:150)
		at org.apache.hc.core5.http.impl.io.ChunkedOutputStream.flushCache(ChunkedOutputStream.java:127)

@chernser chernser added this to the 0.7.2 milestone Dec 3, 2024
@chernser chernser self-assigned this Dec 3, 2024
@chernser chernser modified the milestones: 0.7.2, Priority Backlog Dec 29, 2024
@chernser chernser added the network network and IO related issues label Jan 10, 2025
@chernser chernser removed their assignment Jan 10, 2025
@chernser chernser changed the title [client-v2] Timeout on write/read randomly [client-v2] Connection Disruption Issues (Timeout, Reset by Peer, No Response) Jan 13, 2025
@mshustov mshustov modified the milestones: Priority Backlog, 0.8.0 Jan 14, 2025
@Paultagoras
Copy link
Contributor

Paultagoras commented Jan 14, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug client-api-v2 network network and IO related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants