Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebSocket sendBinary() is blocking infinite #11362

Closed
Horcrux7 opened this issue Feb 1, 2024 · 11 comments
Closed

WebSocket sendBinary() is blocking infinite #11362

Horcrux7 opened this issue Feb 1, 2024 · 11 comments
Labels
Bug For general bugs on Jetty side

Comments

@Horcrux7
Copy link

Horcrux7 commented Feb 1, 2024

Jetty version(s)
10.0.19

Java version/vendor
17.0.8.1 (amd64) Eclipse Adoptium

OS type/version
Linux 5.15.0-52-generic

Description
Under certain rare conditions, a call to RemoteEndpoint.Basic.sendBinary(ByteBuffer) may infinitely block. Refer to the stack trace provided below. This issue typically occurs around lunchtime. I suspect that some clients or browsers with open WebSocket connections may go into standby or become unresponsive. Only a server restart resolves the problem because there are no available threads to send WebSocket events.

The problem seems to me the follow code snippet in JavaxWebSocketBasicRemote.sendBinary

        FutureCallback b = new FutureCallback();
        sendFrame(new Frame(OpCode.BINARY).setPayload(data), b, false);
        b.block();

The code seems synchron for me. After the call of sendFrame() there occur nothing. The block is only calling for throwing a possible exception. If the callback was not called then it block infinite. A possible fix can be to use block(1, TimeUnit.MILLISECONDS)or equals. Of course the better fix would be to find the point where the callback was not called.

"ForkJoinPool.commonPool-worker-1" #13 java.util.concurrent.ForkJoinWorkerThread@795a0c0e daemon prio=5 cpu=1763ms elapsed=8194.712s WAITING on java.util.concurrent.CountDownLatch$Sync@6bdb7286
	at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
	-  waiting on java.util.concurrent.CountDownLatch$Sync@6bdb7286
	at [email protected]/java.util.concurrent.locks.LockSupport.park(Unknown Source)
	at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown Source)
	at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(Unknown Source)
	at [email protected]/java.util.concurrent.CountDownLatch.await(Unknown Source)
	at org.eclipse.jetty.util.FutureCallback.get(FutureCallback.java:114)
	at org.eclipse.jetty.util.FutureCallback.block(FutureCallback.java:149)
	at org.eclipse.jetty.util.FutureCallback.block(FutureCallback.java:139)
	at org.eclipse.jetty.websocket.javax.common.JavaxWebSocketBasicRemote.sendBinary(JavaxWebSocketBasicRemote.java:66)

threaddump22.txt
threaddump21.txt

@Horcrux7 Horcrux7 added the Bug For general bugs on Jetty side label Feb 1, 2024
@joakime
Copy link
Contributor

joakime commented Feb 1, 2024

Using javax.websocket.RemoteEndpoint.Basic means we must block, by spec, without timeout on that API call.

Make sure you have a sane value for your connector idle timeout.
That is how the idle timeout in this situation is handled.

Your other option is to use the javax.websocket.RemoteEndpoint.Async and do the blocking / timeout on the API call yourself.

@Horcrux7
Copy link
Author

Horcrux7 commented Feb 1, 2024

Unfortunately, we cannot use the async endpoint due to a bug in Tomcat: https://bz.apache.org/bugzilla/show_bug.cgi?id=56026

Yes, it is right that you must block until completed. But this occur already in the call of sendFrame(). If I understand the code right then sendFrame() is already blocking. If you reach the line b.block() there is nothing to wait. If the callback was not called sendFrame() then you wait for ever. --> dead lock

What you means with connector idle timeout? How can connector Idle timeout have any effect on dead lock?

@joakime
Copy link
Contributor

joakime commented Feb 1, 2024

The connector idle timeout will close the connection.
That closure will trigger errors / exceptions in anything working with that network connection.
Such as a that websocket write.
The callbacks associated with those writes fail (aka Callback.failed(Throwable))
That results in the FutureCallback.get() failing.
That results in the FutureCallback.block() failing.
The call to javax.websocket.RemoteEndpoint.Basic.sendBinary() completes.

@joakime
Copy link
Contributor

joakime commented Feb 1, 2024

Also note, that Jetty 10 is now at End of Community Support.

You should be using Jetty 12 at this point in time.
For your use of javax.websocket that means you use the ee8 environment on Jetty 12 (as that still supports that old namespace).

@sbordet
Copy link
Contributor

sbordet commented Feb 1, 2024

But this occur already in the call of sendFrame(). If I understand the code right then sendFrame() is already blocking.

No, it is not. It takes a callback and notifies the callback when finished. That is why we need to block after the call to sendFrame().

@Horcrux7
Copy link
Author

Horcrux7 commented Feb 1, 2024

@sbordet If I debug in the code then the callback is ever call synchron. This means if sendFrame() returns then it was already called.

@Horcrux7
Copy link
Author

Horcrux7 commented Feb 1, 2024

@joakime Thanks for the info. I thought that Jetty 10.x was the last version with the old package. I have try to migrate to 12.0.6-ee8 but without luck. But this is another problem.

@sbordet
Copy link
Contributor

sbordet commented Feb 1, 2024

If I debug in the code then the callback is ever call synchron. This means if sendFrame() returns then it was already called.

No, it means that in that particular case it was called synchronously, but it may not be, for example when the flusher is already sending other content, or when the TCP connection is congested, etc.

If you have a case where you debug, sendFrame() returns and you wait forever in block() then we would like to know.

Do you have the permessage-deflate extension enabled?

Are you able to reproduce the issue, and take a server dump as explained here:
https://eclipse.dev/jetty/documentation/jetty-12/programming-guide/index.html#pg-troubleshooting-component-dump
https://eclipse.dev/jetty/documentation/jetty-12/operations-guide/index.html#og-troubleshooting-dump

@lachlan-roberts
Copy link
Contributor

You also have this method jakarta.websocket.WebSocketContainer#setAsyncSendTimeout which will limit the amount of time the WebSocket frame can sit in the flusher waiting to be sent. After the timeout expires it should fail the callback.

@Horcrux7
Copy link
Author

Horcrux7 commented Feb 2, 2024

We have migrate to version 12.0.6 now. We have add the option to create an dump from the Jetty server. But I am unclear what the dump will help. We have also set a setAsyncSendTimeout value now. We need to deploy it in production next week. Then we will see the effect. Thanks for all the suggestions.

PS: We does not have permessage-deflate enabled because our WebSocket events are typical 20-50 bytes large.

@Horcrux7
Copy link
Author

We have migrated to version 12.0.6 and use setAsyncSendTimeout( 3000 ). We have also set a timer to monitor the call which will interrupt it after 5000 milliseconds.

In case of problems only approx 5% of the cases the async send timeout is reacting. In 95% our timeout is cancel the thread.

We have also isolated that the problem occur only with clients that use Safari on MacOS.

The workaround with the timer and a larger thread pool solves our problem with the hanging server and I am closing this ticket. But I think a default of an unlimited send timeout and not working in the specified time period is bad behavior. Thanks for all the suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For general bugs on Jetty side
Projects
None yet
Development

No branches or pull requests

4 participants