Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency issues with AFSocketServerConnector #142

Closed
kevink-sq opened this issue Oct 25, 2023 · 2 comments
Closed

Concurrency issues with AFSocketServerConnector #142

kevink-sq opened this issue Oct 25, 2023 · 2 comments
Assignees
Labels
bug The issue describes a bug in the code
Milestone

Comments

@kevink-sq
Copy link
Contributor

kevink-sq commented Oct 25, 2023

Describe the bug
We found our production Jetty servers would stop processing requests after traffic spikes due to junixsocket throwing ConcurrentModificationException. This bug may have been a regression introduced in optimizing the key removal: a7c3106

java.util.ConcurrentModificationException
        at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1597)
        at java.base/java.util.HashMap$KeyIterator.next(HashMap.java:1620)
        at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.updateKeys(ManagedSelector.java:709)
        at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:539)
        at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produceTask(AdaptiveExecutionStrategy.java:455)
        at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:248)
        at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:199)
        at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:411)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)

A PR has been raised to propose a fix: #141

To Reproduce
Refer to https://github.com/kevink-sq/jetty-concurrency-issue-poc

Notes
Other issues were also detected during isolated load tests in the poc and was fixed by introducing an atomic check to AFSelectorKey#cancel. This issue appeared when running vegeta load tests and forcibly killing the vegeta load:

2023-10-25 01:39:50.857:DEBUG:oejs.AbstractConnector:etp827671026-30-acceptor-0@5ee135f3-AFSocketServerConnector@6cec2a14[org.newsclub.net.unix.AFUNIXSocketAddress[path=junix-ingress.sock]]: Accept Closed by Interrupt
java.nio.channels.ClosedByInterruptException
        at org.newsclub.net.unix.jetty.AFSocketServerConnector.accept(AFSocketServerConnector.java:352)
        at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:748)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by:
java.net.SocketException: Too many open files
        at org.newsclub.net.unix.NativeUnixSocket.accept(Native Method)
        at org.newsclub.net.unix.AFSocketImpl.accept0(AFSocketImpl.java:262)
        at org.newsclub.net.unix.AFServerSocket.accept1(AFServerSocket.java:307)
        at org.newsclub.net.unix.AFServerSocketChannel.accept(AFServerSocketChannel.java:108)
        at org.newsclub.net.unix.AFUNIXServerSocketChannel.accept(AFUNIXServerSocketChannel.java:44)
        at org.newsclub.net.unix.AFUNIXServerSocketChannel.accept(AFUNIXServerSocketChannel.java:27)
        at org.newsclub.net.unix.jetty.AFSocketServerConnector.accept(AFSocketServerConnector.java:340)
        at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:748)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
@kohlschuetter kohlschuetter added this to the 2.8.2 milestone Oct 25, 2023
@kohlschuetter kohlschuetter self-assigned this Oct 25, 2023
@kohlschuetter kohlschuetter added the bug The issue describes a bug in the code label Oct 25, 2023
@kohlschuetter
Copy link
Member

Fixing with #141

@kohlschuetter
Copy link
Member

Fixed in junixsocket 2.8.2. Thanks again for reporting and providing a patch, @kevink-sq !

kohlschuetter added a commit that referenced this issue Nov 9, 2023
Clients may iterate upon AFSelector.selectedKeys() while the selected
keys are modified in another thread.

Change the underlying datastructure to be thread-safe (use
ConcurrentHashMap), and add all "ready" selectors even if they're marked
invalid (they will be removed later).

We reimplement the fix as the original patch caused a FileDescriptor
leak and TIPC test failures.

#142
kohlschuetter added a commit that referenced this issue Nov 9, 2023
Clients may iterate upon AFSelector.selectedKeys() while the selected
keys are modified in another thread.

Change the underlying datastructure to be thread-safe (use
ConcurrentHashMap), and add all "ready" selectors even if they're marked
invalid (they will be removed later).

We reimplement the fix as the original patch caused a FileDescriptor
leak and TIPC test failures.

#142
#145
kohlschuetter added a commit that referenced this issue Nov 12, 2023
Clients may iterate upon AFSelector.selectedKeys() while the selected
keys are modified in another thread.

Change the underlying datastructure to be thread-safe (use
ConcurrentHashMap), and add all "ready" selectors even if they're marked
invalid (they will be removed later).

We reimplement the fix as the original patch caused a FileDescriptor
leak and TIPC test failures.

#142
#145
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The issue describes a bug in the code
Projects
None yet
Development

No branches or pull requests

2 participants