Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] [broker] Remove blocking calls from Subscription.getStats #23088

Merged
merged 4 commits into from
Jul 29, 2024

Conversation

poorbarcode
Copy link
Contributor

@poorbarcode poorbarcode commented Jul 29, 2024

Motivation

A deadlock occurs.

"metadata-store-13-1" #16 prio=5 os_prio=0 cpu=621052.68ms elapsed=249537.73s tid=0x00007f31415b8570 nid=0x6a waiting on condition  [0x00007f3118ffd000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
	- parking to wait for  <0x00000000e50e8200> (a java.util.concurrent.CompletableFuture$Signaller)
	at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:211)
	at java.util.concurrent.CompletableFuture$Signaller.block([email protected]/CompletableFuture.java:1864)
	at java.util.concurrent.ForkJoinPool.unmanagedBlock([email protected]/ForkJoinPool.java:3465)
	at java.util.concurrent.ForkJoinPool.managedBlock([email protected]/ForkJoinPool.java:3436)
	at java.util.concurrent.CompletableFuture.waitingGet([email protected]/CompletableFuture.java:1898)
	at java.util.concurrent.CompletableFuture.get([email protected]/CompletableFuture.java:2072)
	at org.apache.pulsar.broker.service.persistent.PersistentSubscription.getStats(PersistentSubscription.java:1209)
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.lambda$asyncGetStats$90(PersistentTopic.java:2391)
	at org.apache.pulsar.broker.service.persistent.PersistentTopic$$Lambda$1043/0x00007f30cc9c1028.accept(Unknown Source)
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:563)
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:277)
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.asyncGetStats(PersistentTopic.java:2389)
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.getStats(PersistentTopic.java:2349)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.lambda$internalGetPartitionedStats$167(PersistentTopicsBase.java:1513)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase$$Lambda$2195/0x00007f30ccc3d408.apply(Unknown Source)
	at java.util.concurrent.CompletableFuture.uniApplyNow([email protected]/CompletableFuture.java:684)
	at java.util.concurrent.CompletableFuture.uniApplyStage([email protected]/CompletableFuture.java:662)
	at java.util.concurrent.CompletableFuture.thenApply([email protected]/CompletableFuture.java:2168)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.lambda$internalGetPartitionedStats$168(PersistentTopicsBase.java:1512)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase$$Lambda$2194/0x00007f30ccc3d1c0.apply(Unknown Source)
	at java.util.concurrent.CompletableFuture.uniComposeStage([email protected]/CompletableFuture.java:1187)
	at java.util.concurrent.CompletableFuture.thenCompose([email protected]/CompletableFuture.java:2309)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.lambda$internalGetPartitionedStats$170(PersistentTopicsBase.java:1509)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase$$Lambda$2192/0x00007f30ccc3cd40.accept(Unknown Source)
	at java.util.concurrent.CompletableFuture$UniAccept.tryFire([email protected]/CompletableFuture.java:718)
	at java.util.concurrent.CompletableFuture.postComplete([email protected]/CompletableFuture.java:510)
	at java.util.concurrent.CompletableFuture.complete([email protected]/CompletableFuture.java:2147)
	at org.apache.pulsar.metadata.impl.ZKMetadataStore.handleGetResult(ZKMetadataStore.java:288)
	at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$batchOperation$7(ZKMetadataStore.java:232)
	at org.apache.pulsar.metadata.impl.ZKMetadataStore$$Lambda$225/0x00007f30cc34b5d0.run(Unknown Source)
	at java.util.concurrent.Executors$RunnableAdapter.call([email protected]/Executors.java:539)
	at java.util.concurrent.FutureTask.run([email protected]/FutureTask.java:264)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run([email protected]/ScheduledThreadPoolExecutor.java:304)
	at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run([email protected]/Thread.java:840)

"pulsar-load-manager-1-1" #51 prio=5 os_prio=0 cpu=5248.25ms elapsed=249536.68s tid=0x00007f314191a310 nid=0x8b waiting on condition  [0x00007f30c5ffe000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
	- parking to wait for  <0x00000000e50e8a28> (a java.util.concurrent.CompletableFuture$Signaller)
	at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:211)
	at java.util.concurrent.CompletableFuture$Signaller.block([email protected]/CompletableFuture.java:1864)
	at java.util.concurrent.ForkJoinPool.unmanagedBlock([email protected]/ForkJoinPool.java:3465)
	at java.util.concurrent.ForkJoinPool.managedBlock([email protected]/ForkJoinPool.java:3436)
	at java.util.concurrent.CompletableFuture.waitingGet([email protected]/CompletableFuture.java:1898)
	at java.util.concurrent.CompletableFuture.join([email protected]/CompletableFuture.java:2117)
	at org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl.writeBundleDataOnZooKeeper(ModularLoadManagerImpl.java:1170)
	at org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerWrapper.writeResourceQuotasToZooKeeper(ModularLoadManagerWrapper.java:118)
	at org.apache.pulsar.broker.loadbalance.LoadResourceQuotaUpdaterTask.run(LoadResourceQuotaUpdaterTask.java:39)
	at java.util.concurrent.Executors$RunnableAdapter.call([email protected]/Executors.java:539)
	at java.util.concurrent.FutureTask.runAndReset([email protected]/FutureTask.java:305)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run([email protected]/ScheduledThreadPoolExecutor.java:305)
	at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run([email protected]/Thread.java:840)

"pulsar-web-42-3" #67 prio=5 os_prio=0 cpu=11269.15ms elapsed=249532.09s tid=0x00007f3141a5a580 nid=0x9a waiting on condition  [0x00007f30c50fd000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
	- parking to wait for  <0x00000000e4e87000> (a java.util.concurrent.CompletableFuture$Signaller)
	at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:211)
	at java.util.concurrent.CompletableFuture$Signaller.block([email protected]/CompletableFuture.java:1864)
	at java.util.concurrent.ForkJoinPool.unmanagedBlock([email protected]/ForkJoinPool.java:3465)
	at java.util.concurrent.ForkJoinPool.managedBlock([email protected]/ForkJoinPool.java:3436)
	at java.util.concurrent.CompletableFuture.waitingGet([email protected]/CompletableFuture.java:1898)
	at java.util.concurrent.CompletableFuture.get([email protected]/CompletableFuture.java:2072)
	at org.apache.pulsar.broker.service.persistent.PersistentSubscription.getStats(PersistentSubscription.java:1209)
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.lambda$asyncGetStats$90(PersistentTopic.java:2391)
	at org.apache.pulsar.broker.service.persistent.PersistentTopic$$Lambda$1043/0x00007f30cc9c1028.accept(Unknown Source)
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:563)
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:277)
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.asyncGetStats(PersistentTopic.java:2389)
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.getStats(PersistentTopic.java:2349)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.lambda$internalGetPartitionedStats$167(PersistentTopicsBase.java:1513)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase$$Lambda$2195/0x00007f30ccc3d408.apply(Unknown Source)
	at java.util.concurrent.CompletableFuture.uniApplyNow([email protected]/CompletableFuture.java:684)
	at java.util.concurrent.CompletableFuture.uniApplyStage([email protected]/CompletableFuture.java:662)
	at java.util.concurrent.CompletableFuture.thenApply([email protected]/CompletableFuture.java:2168)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.lambda$internalGetPartitionedStats$168(PersistentTopicsBase.java:1512)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase$$Lambda$2194/0x00007f30ccc3d1c0.apply(Unknown Source)
	at java.util.concurrent.CompletableFuture.uniComposeStage([email protected]/CompletableFuture.java:1187)
	at java.util.concurrent.CompletableFuture.thenCompose([email protected]/CompletableFuture.java:2309)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.lambda$internalGetPartitionedStats$170(PersistentTopicsBase.java:1509)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase$$Lambda$2192/0x00007f30ccc3cd40.accept(Unknown Source)
	at java.util.concurrent.CompletableFuture.uniAcceptNow([email protected]/CompletableFuture.java:757)
	at java.util.concurrent.CompletableFuture.uniAcceptStage([email protected]/CompletableFuture.java:735)
	at java.util.concurrent.CompletableFuture.thenAccept([email protected]/CompletableFuture.java:2182)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.internalGetPartitionedStats(PersistentTopicsBase.java:1496)
	at org.apache.pulsar.broker.admin.v2.PersistentTopics.getPartitionedStats(PersistentTopics.java:1305)
	at jdk.internal.reflect.GeneratedMethodAccessor396.invoke(Unknown Source)
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke([email protected]/DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke([email protected]/Method.java:568)
	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$$Lambda$561/0x00007f30cc5b92e0.invoke(Unknown Source)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167)
	at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:159)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:475)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:397)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)
	at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:255)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
	at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:234)
	at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:680)
	at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)
	at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
	at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
	at org.apache.pulsar.broker.web.ResponseHandlerFilter.doFilter(ResponseHandlerFilter.java:66)
	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
	at org.apache.pulsar.broker.web.AuthenticationFilter.doFilter(AuthenticationFilter.java:65)
	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
	at org.apache.pulsar.broker.intercept.BrokerInterceptor.onFilter(BrokerInterceptor.java:224)
	at org.apache.pulsar.broker.web.ProcessHandlerFilter.doFilter(ProcessHandlerFilter.java:46)
	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
	at org.apache.pulsar.broker.web.PreInterceptFilter.doFilter(PreInterceptFilter.java:73)
	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
	at org.eclipse.jetty.servlets.QoSFilter.doFilter(QoSFilter.java:202)
	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
	at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772)
	at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
	at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:181)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.eclipse.jetty.server.Server.handle(Server.java:516)
	at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
	at org.eclipse.jetty.server.HttpChannel$$Lambda$811/0x00007f30cc7ab120.dispatch(Unknown Source)
	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
	at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run([email protected]/Thread.java:840)

Modifications

Avoid blocking calls from Subscription.getStats

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: x

@poorbarcode poorbarcode added this to the 3.4.0 milestone Jul 29, 2024
@poorbarcode poorbarcode self-assigned this Jul 29, 2024
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jul 29, 2024
@poorbarcode poorbarcode changed the title [fix] [broker] Fix deadlock when calling GetPartitionedStats [fix] [broker] Remove blocking calls from Subscription.getStats Jul 29, 2024
@poorbarcode poorbarcode force-pushed the fix_deadlock/get_partitioned_stats branch from 4612237 to 9da7cbd Compare July 29, 2024 02:01
@codelipenghui
Copy link
Contributor

@poorbarcode Do you have the stack information for another thread which hold the lock?

@poorbarcode
Copy link
Contributor Author

/pulsarbot rerun-failure-checks

@poorbarcode poorbarcode requested a review from nodece July 29, 2024 03:39
@poorbarcode poorbarcode requested a review from nodece July 29, 2024 03:44
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 88.23529% with 6 lines in your changes missing coverage. Please review.

Project coverage is 73.44%. Comparing base (bbc6224) to head (d5449c4).
Report is 470 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23088      +/-   ##
============================================
- Coverage     73.57%   73.44%   -0.13%     
- Complexity    32624    33516     +892     
============================================
  Files          1877     1917      +40     
  Lines        139502   144074    +4572     
  Branches      15299    15741     +442     
============================================
+ Hits         102638   105822    +3184     
- Misses        28908    30133    +1225     
- Partials       7956     8119     +163     
Flag Coverage Δ
inttests 27.79% <45.09%> (+3.20%) ⬆️
systests 24.78% <45.09%> (+0.45%) ⬆️
unittests 72.50% <88.23%> (-0.35%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...sar/broker/service/persistent/PersistentTopic.java 79.26% <94.44%> (+0.80%) ⬆️
...ker/service/persistent/PersistentSubscription.java 77.10% <73.33%> (+0.41%) ⬆️

... and 508 files with indirect coverage changes

@nodece nodece requested a review from lhotari July 29, 2024 07:16
@nodece nodece merged commit e59cd05 into apache:master Jul 29, 2024
51 checks passed
poorbarcode added a commit that referenced this pull request Jul 29, 2024
poorbarcode added a commit that referenced this pull request Jul 29, 2024
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 30, 2024
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants