Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to reconnect to hazelcast due timeout #274

Closed
cujo opened this issue Jul 23, 2021 · 1 comment · Fixed by #277
Closed

Unable to reconnect to hazelcast due timeout #274

cujo opened this issue Jul 23, 2021 · 1 comment · Fixed by #277
Labels

Comments

@cujo
Copy link

cujo commented Jul 23, 2021

v2.0.1-SNAPSHOT

Loosing connection to hazelcast causes com.hazelcast.client.HazelcastClientNotActiveException: Client is shutting down infinit exception loop after connectionTimeout, which leads to application unavailability and high cpu usage.

A possible workaround is to increase hazelcast connectionTimeout from default 30S to a bigger value but this not resolve the problem completelly.

See log for details
2021-07-23 10:51:43.547  WARN 1 --- [nt_1.internal-7] c.h.c.i.c.ClientConnectionManager        : hz.client_1 [dev] [4.2.1] Exception during initial connection to Member [10.10.213.40]:5701 - 07825ec3-5703-4a08-bd58-2841898880a1: com.hazelcast.core.HazelcastException: java.io.IOException: null to address /10.10.213.40:5701
2021-07-23 10:51:48.553  WARN 1 --- [nt_1.internal-7] c.h.c.i.c.ClientConnectionManager        : hz.client_1 [dev] [4.2.1] Exception during initial connection to [zeebe]:5701: com.hazelcast.core.HazelcastException: java.io.IOException: null to address zeebe/10.10.213.40:5701
2021-07-23 10:51:48.555  WARN 1 --- [nt_1.internal-7] c.h.c.i.c.ClientConnectionManager        : hz.client_1 [dev] [4.2.1] Unable to get live cluster connection, cluster connect timeout (30000 ms) is reached. Attempt 12.
2021-07-23 10:51:48.556  WARN 1 --- [nt_1.internal-7] c.h.c.i.c.ClientConnectionManager        : hz.client_1 [dev] [4.2.1] Could not connect to any cluster, shutting down the client: Unable to connect to any cluster.
2021-07-23 10:51:48.583  WARN 1 --- [nt_1.internal-2] c.h.c.i.c.ClientConnectionManager        : hz.client_1 [dev] [4.2.1] Could not connect to member 07825ec3-5703-4a08-bd58-2841898880a1, reason com.hazelcast.core.HazelcastException: java.io.IOException: null to address /10.10.213.40:5701

2021-07-23 10:51:49.379 ERROR 1 --- [pool-1-thread-1] i.z.h.connect.java.ZeebeHazelcast        : Fail to read from ring-buffer at sequence '206'. Will try again.

com.hazelcast.client.HazelcastClientNotActiveException: Client is shutting down
        at com.hazelcast.client.impl.spi.impl.ClientInvocation.notifyExceptionWithOwnedPermission(ClientInvocation.java:318) ~[hazelcast-4.2.1.jar:4.2.1]
        at com.hazelcast.client.impl.spi.impl.ClientInvocation.invokeOnSelection(ClientInvocation.java:211) ~[hazelcast-4.2.1.jar:4.2.1]
        at com.hazelcast.client.impl.spi.impl.ClientInvocation.retry(ClientInvocation.java:237) ~[hazelcast-4.2.1.jar:4.2.1]
        at com.hazelcast.client.impl.spi.impl.ClientInvocation.run(ClientInvocation.java:218) ~[hazelcast-4.2.1.jar:4.2.1]
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[na:na]
        at java.base/java.lang.Thread.run(Unknown Source) ~[na:na]
        at com.hazelcast.internal.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76) ~[hazelcast-4.2.1.jar:4.2.1]
        at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102) ~[hazelcast-4.2.1.jar:4.2.1]
        at ------ submitted from ------.() ~[na:na]
        at com.hazelcast.internal.util.ExceptionUtil.cloneExceptionWithFixedAsyncStackTrace(ExceptionUtil.java:279) ~[hazelcast-4.2.1.jar:4.2.1]
        at com.hazelcast.spi.impl.AbstractInvocationFuture.wrapRuntimeException(AbstractInvocationFuture.java:1968) ~[hazelcast-4.2.1.jar:4.2.1]
        at com.hazelcast.spi.impl.AbstractInvocationFuture.wrapOrPeel(AbstractInvocationFuture.java:1949) ~[hazelcast-4.2.1.jar:4.2.1]
        at com.hazelcast.spi.impl.AbstractInvocationFuture$ExceptionalResult.wrapForJoinInternal(AbstractInvocationFuture.java:1431) ~[hazelcast-4.2.1.jar:4.2.1]
        at com.hazelcast.spi.impl.AbstractInvocationFuture.resolveAndThrowForJoinInternal(AbstractInvocationFuture.java:600) ~[hazelcast-4.2.1.jar:4.2.1]
        at com.hazelcast.spi.impl.AbstractInvocationFuture.joinInternal(AbstractInvocationFuture.java:584) ~[hazelcast-4.2.1.jar:4.2.1]
        at com.hazelcast.client.impl.proxy.ClientRingbufferProxy.invoke(ClientRingbufferProxy.java:220) ~[hazelcast-4.2.1.jar:4.2.1]
        at com.hazelcast.client.impl.proxy.ClientRingbufferProxy.readOne(ClientRingbufferProxy.java:156) ~[hazelcast-4.2.1.jar:4.2.1]
        at io.zeebe.hazelcast.connect.java.ZeebeHazelcast.readNext(ZeebeHazelcast.java:109) ~[zeebe-hazelcast-connector-1.0.0.jar:1.0.0]
        at io.zeebe.hazelcast.connect.java.ZeebeHazelcast.readFromBuffer(ZeebeHazelcast.java:101) ~[zeebe-hazelcast-connector-1.0.0.jar:1.0.0]
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[na:na]
        at java.base/java.lang.Thread.run(Unknown Source) ~[na:na]
Caused by: java.io.IOException: No connection found to cluster.
        at com.hazelcast.client.impl.connection.tcp.TcpClientConnectionManager.checkInvocationAllowed(TcpClientConnectionManager.java:542) ~[hazelcast-4.2.1.jar:4.2.1]
        at com.hazelcast.client.impl.spi.impl.ClientInvocationServiceImpl.checkInvocationAllowed(ClientInvocationServiceImpl.java:294) ~[hazelcast-4.2.1.jar:4.2.1]
        at com.hazelcast.client.impl.spi.impl.ClientInvocation.invokeOnSelection(ClientInvocation.java:180) ~[hazelcast-4.2.1.jar:4.2.1]
        at com.hazelcast.client.impl.spi.impl.ClientInvocation.retry(ClientInvocation.java:237) ~[hazelcast-4.2.1.jar:4.2.1]
        at com.hazelcast.client.impl.spi.impl.ClientInvocation.run(ClientInvocation.java:218) ~[hazelcast-4.2.1.jar:4.2.1]
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[na:na]
        at java.base/java.lang.Thread.run(Unknown Source) ~[na:na]
        at com.hazelcast.internal.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76) ~[hazelcast-4.2.1.jar:4.2.1]
        at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102) ~[hazelcast-4.2.1.jar:4.2.1]

Expected behavior is either to stop application execution with exception or continue reconnect attempts.

@saig0
Copy link
Contributor

saig0 commented Jul 30, 2021

@cujo thank you for reporting 👍 I can confirm the behavior.

I created an issue in the Hazelcast client to fix the issue: camunda-community-hub/zeebe-hazelcast-exporter#122

For the simple monitor, the connection timeout can be configured with the following property:

zeebe:
  client:    
    worker:
      hazelcast:
        connectionTimeout: PT30S

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants