-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker][branch-3.1] Avoid PublishRateLimiter use an already closed RateLimiter #22011
Conversation
Thanks @coderzc - does this mean that if we're getting this error that rate limiting is or is not working? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's risky to add yet another synchronized method. We have seen these introduce performance regressions and deadlocks in the past. For the shutdown case, an easy solution would be to add a explicit RuntimeException class for the shutdown and simply catch it if it occurs. That would be a low risk change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderzc A better fix is to remove the exception that is thrown when the rate limiter is closed.
Simply return if it is closed. This could be logged at info level.
pulsar/pulsar-common/src/main/java/org/apache/pulsar/common/util/RateLimiter.java
Line 178 in 82237d3
checkArgument(!isClosed(), "Rate limiter is already shutdown"); |
@frankjkelly When we get this error log, then send operation failed temporarily due to timeout. At this time, the rate limiter is being updated. When it is updated, it will continue to work normally. |
…` and print info log
@lhotari Good idea, If the current RateLimiter is already shutdown, then we only return |
Thanks for the clarification so does that mean the client will retry and if so is that within milliseconds, or seconds or something else? |
@frankjkelly I think the client does not retry automatically and the user needs to resend the message manually if message sent fails. |
Hmmm @merlimat or @lhotari can you confirm? If this error requires the caller to catch and retry (as opposed to the client doing it internally) then that's a concern for adoption of the rate limiter (if the error occurs and the client retries as best it can that's OK). |
I don't see anything special about rate limiters in message delivery and retries. The Pulsar client is designed to continue attempting to send messages until a potential send timeout occurs. It's also possible to set up an unlimited send timeout, allowing the client to retry indefinitely. This feature is detailed in the Pulsar documentation, available at https://pulsar.apache.org/docs/3.1.x/cookbooks-deduplication/#pulsar-clients (it's explained in the context of message deduplication). You can refer to the Javadocs for sendTimeout on ProducerBuilder. It's crucial for messaging applications to be equipped to handle potential failures in message delivery, especially when data consistency is a key concern. Once the Pulsar client has acknowledged the message as sent by returning the message id, the responsibility for maintaining and ensuring the delivery of the message shifts to Pulsar. It's also necessary to verify that the message id is returned when using the asynchronous API (sendAsync). If sending results in an error or the messaging application never receives a message id from the Pulsar client, it's the messaging application's responsibility to retry. @frankjkelly, did I answer your question? |
You did - Thanks @lhotari |
…sed RateLimiter (apache#22011) (cherry picked from commit 8cce14c)
…sed RateLimiter (apache#22011) (cherry picked from commit 8cce14c)
Motivation
We found the following error logs on the broker when we used ResourceGroupPublishLimiter. This root cause is
tryAcquire
method has a race condition with thereplaceLimiters
method, leading to publishRateLimiter using an already closed RateLimiter. PrecisePublishLimiter also has the same issue.Modifications
If the current RateLimiter is already shutdown, we only return true and print the info log. Due to pip-322 refactor pulsar rate limiting on version 3.2, so we only need to fix versions before 3.2.
Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: