Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][broker] Close protocol handlers before unloading namespace bundles #22728

Merged

Conversation

BewareMyPower
Copy link
Contributor

Motivation

When the extensible load manager is configured,
NamespaceService#unloadNamespaceBundle could be stuck until 30 seconds timeout.

20:19:13.746 [main:org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl@962] INFO  org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl - localhost:52138 is waiting for owner for serviceUnit:public/__kafka/0xf0000000_0xffffffff
20:19:13.751 [ForkJoinPool.commonPool-worker-3:org.apache.pulsar.broker.resources.MetadataStoreCacheLoader@68] INFO  org.apache.pulsar.broker.resources.MetadataStoreCacheLoader - Successfully updated broker info []
20:19:13.751 [metadata-store-229-1:org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl@447] ERROR org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl - There is no channel owner now.
20:19:13.751 [pulsar-load-manager-222-1:org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl@459] ERROR org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl - Failed to get the channel owner.
java.util.concurrent.ExecutionException: java.lang.IllegalStateException: There is no channel owner now.
20:19:43.750 [CompletableFutureDelayScheduler:org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl@953] WARN  org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl - localhost:52138 failed to wait for owner for serviceUnit:public/__kafka/0xf0000000_0xffffffff; Trying to return the current owner:Optional[localhost:52138]

This case happens when there are lookup requests from Pulsar clients, Assigning and Owned events will be sent to the service unit channel. However, during the close of the last broker, the state should be Free. If there are active producers or consumers in the protocol handler, the state will be changed from Free to Owned and then getActiveOwnerAsync will be called by getOwnerAsync. Since no more Owned events will be written to the channel, the pending request in getOwnerRequests would never be removed.

Modifications

Close protocol handlers before unloading namespace bundles to ensure no more lookup requests were sent before the namespaces are unloaded.

Add PulsarClientBasedHandlerTest to verify this change works.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

…bundles

### Motivation

When the extensible load manager is configured,
`NamespaceService#unloadNamespaceBundle` could be stuck until 30 seconds
timeout.

```
20:19:13.746 [main:org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl@962] INFO  org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl - localhost:52138 is waiting for owner for serviceUnit:public/__kafka/0xf0000000_0xffffffff
20:19:13.751 [ForkJoinPool.commonPool-worker-3:org.apache.pulsar.broker.resources.MetadataStoreCacheLoader@68] INFO  org.apache.pulsar.broker.resources.MetadataStoreCacheLoader - Successfully updated broker info []
20:19:13.751 [metadata-store-229-1:org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl@447] ERROR org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl - There is no channel owner now.
20:19:13.751 [pulsar-load-manager-222-1:org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl@459] ERROR org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl - Failed to get the channel owner.
java.util.concurrent.ExecutionException: java.lang.IllegalStateException: There is no channel owner now.
20:19:43.750 [CompletableFutureDelayScheduler:org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl@953] WARN  org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl - localhost:52138 failed to wait for owner for serviceUnit:public/__kafka/0xf0000000_0xffffffff; Trying to return the current owner:Optional[localhost:52138]
```

This case happens when there are lookup requests from Pulsar clients,
`Assigning` and `Owned` events will be sent to the service unit channel.
However, during the close of the last broker, the state should be
`Free`. If there are active producers or consumers in the protocol
handler, the state will be changed from `Free` to `Owned` and then
`getActiveOwnerAsync` will be called by `getOwnerAsync`. Since no more
`Owned` events will be written to the channel, the pending request in
`getOwnerRequests` would never be removed.

### Modifications

Close protocol handlers before unloading namespace bundles to ensure no
more lookup requests were sent before the namespaces are unloaded.

Add `PulsarClientBasedHandlerTest` to verify this change works.
@BewareMyPower BewareMyPower added type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages area/broker labels May 17, 2024
@BewareMyPower BewareMyPower added this to the 3.4.0 milestone May 17, 2024
@BewareMyPower BewareMyPower self-assigned this May 17, 2024
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label May 17, 2024
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Technoboy- Technoboy- merged commit a66ff17 into apache:master May 21, 2024
55 checks passed
@BewareMyPower BewareMyPower deleted the bewaremypower/lb-zk-thread-exception branch May 22, 2024 01:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker cherry-picked/branch-3.2 cherry-picked/branch-3.3 doc-not-needed Your PR changes do not impact docs ready-to-test release/3.2.4 release/3.3.1 type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants