Delayed message delivery implementation #4062

merlimat · 2019-04-17T00:23:27Z

Motivation

Fixes #2375

Allow the option to mark messages for delayed delivery.

Notes:

If delayed delivery is disabled, messages are always delivered immediately and there's no tracking overhead.
Messages are only delayed on shared subscriptions. Other subscriptions will deliver immediately.
The tracking of delayed messages is lazily initialized and if a messages has no delay, it will have no overhead.

Implementation

The tracking is ephemeral and implemented in Pulsar broker. The main reason is to avoid a client refetching messages multiple times when there are multiple consumer reconnections.
Broker keeps a priority-queue as a buffer in direct memory
Use one Netty hashwheel timer to drive the checking of topics that have messages ready to be scheduled.

Possible improvements

The goal of this PR is to have simple working solution that can be used to efficiently apply delay on 10s of millions of messages at any given time.

There are several improvements that could be considered, based on real-world usage feedback.
For example:

Compress the priority queue, either by collapsing id ranges or by just passing the buffer through gzip.
Allow batching of messages with very close target time.

sijie · 2019-04-17T06:46:44Z

@merlimat : how is this different from #3155?

Also I think there was a long thread discussion about delayed message implementation. there were pushbacks on implementing delayed messages on brokers. hence a lot of efforts were postponed due to you and bunch of other people had concerns on the solutions to PIP-26 and #3155. but the approach here seems to take the broker-side approach again. I am wondering what is the thought behind this. How does this different from other proposals?

Beside the implementation, since there was already a long discussion about delayed messages and I have spent the time on pushing the discussion and other people's efforts forward. Isn't it better to first get an agreement (or at least update the discussion thread) before starting a new PR?

sijie · 2019-04-17T06:49:34Z

nvm. I saw the email in the email thread now.

sijie

I have looked into the pull request. This is actually a simpler implementation of PIP-26.

The DelayedDeliveryTracker in this pull request is what is called delayed message index in PIP-26.

In this pull request, the tracker is a priority queue, all in memory, and rebuild by replaying the messages after a broker crash.

In PIP-26, the tracker is a hash-wheel time-partitioned index. it can be all-in-memory and rebuilt by replaying the messages after a broker crash; or the time-partitioned index can be stored in ledgers to avoid replaying the messages to rebuild the index.

in theory, I don't see any technical differences between PIP-26 and #4062. In fact, I think #4062 is a simpler implementation of PIP-26 whose delayed message index is implemented using a priority queue. If so, how does this PR address the concerns raised when PIP-26 was started (i.e. making changes to dispatcher). FYI, PIP-26 was postponed because there were concerns about adding changes to dispatcher.

merlimat · 2019-04-17T12:47:01Z

I have looked into the pull request. This is actually a simpler implementation of PIP-26.

Is that a bad thing? Is there any limitation in this approach?

If so, how does this PR address the concerns raised when PIP-26 was started (i.e. making changes to dispatcher).

The changes to the dispatcher itself have been isolated in a very few specific points. It should be easy to review and verify that with feature turned off there's zero impact in current behavior.

The biggest difference with this PR is that the tracking happens entirely off-heap, in direct memory. There are no objects created and retained for extended amount of time, which is the pattern that will kill the GC performances.

A topic will have ByteBuf using direct memory where the priority queue is stored. On the data path there are no other allocations required.

joefk · 2019-04-17T14:07:40Z

The changes to the dispatcher itself have been isolated in a very few specific points. It should be easy to review and verify that with feature turned off there's zero impact in current behavior.

Nice to see that change, is there a config option to turn it ON per namespace?

A topic will have ByteBuf using direct memory where the priority queue is stored. On the data path there are no other allocations required.

How does this work with load balancing? Does the load balancer know which topics are going to need this allocation.

Is there a limit on delay, num of delayed messages pending etc? What's the limits?

How is inversion handled? What happens if a message to be delayed by for eg: a few days is at the head? Will that halt the advance of the delete cursor? What if a few of those kind are randomly spread around? Is this taking for granted that on a broker restart, everything spanning the period of largest delay will potentially be read through again? Is there a checkpoint for a polite shutdown/unload?

I prefer configurable limits, and deterministic performance so that system behavior can be predicted during rolling upgrades and failures. Pulsar handles rolling upgrades and failures way better than other systems , and it would be preferable to maintain that.

lovelle · 2019-04-17T14:12:06Z

Is that a bad thing? Is there any limitation in this approach?

To me this is one of the best things about this pull, and absolutely not a bad thing.

I still didn't take a deep look but my only concern would be, how will behave when very different range of delay arrive? Users sometimes makes an abusive use from this type of feature.

The improvement I really like from this is that both features (this and #4062) uses a priority queue but this pull uses the buffer in direct memory 👍

Compress the priority queue, either by collapsing id ranges or by just passing the buffer through gzip.

Since each adjacent message could have an arbitrary delay I can't see how collapsing by id range could be made.

sijie · 2019-04-17T18:03:02Z

Is that a bad thing? Is there any limitation in this approach?

It is not a bad thing. I am actually super happy to see this happen because I am a supporter for broker-side approaches from the beginning (if you have followed the email discussion).

The changes to the dispatcher itself have been isolated in a very few specific points.

If you took a look at my comment, PIP-26 also isolates the changes to a structure called DelayedMessageIndex (which is the structure you called DelayedDeliveryTracker here). So technically there are no fundamental differences between this PR and PIP-26 regarding the concerns around changes touching dispatcher. I am just trying to figure out why and make sure the authors of PIP-26 also understand your thoughts behind this. IMO that is an important thing for building a healthy community.

The biggest difference with this PR is that the tracking happens entirely off-heap, in direct memory. There are no objects created and retained for extended amount of time, which is the pattern that will kill the GC performances.

I don't think the biggest difference with this PR and PIP-26 is the direct memory thing you mentioned on implementing DelayDeliveryTracker. The delayed message index in PIP-26 can also be implemented using direct memory without allocation.

IMO the difference between this PR and PIP-26 is - DelayedDeliveryTracker in this PR is a pure memory structure which can not hold "delayed index" beyond memory; DelayedMessageIndex in PIP-26 is a time partitioned structure which can spool the index back to ledgers. DelayedDeliveryTracker is limited at the delay ranges that it can support. DelayedMessageIndex is a more generic approach on supporting arbitrary delays or scheduled messages.

DelayedDeliveryTracker and DelayedMessageIndex are just two different implementations of one same things. If the current implementation of DelayedDeliveryTracker is acceptable, why the proposal of a time-partitioned DelayedMessageIndex is not acceptable? People can choose which implementation to use by configuring a configuration in the broker configuration.

Lastly, PIP-26 already presents changes regarding api, protocol, namespace policies and many other changes around this area. Shall we just pickup the proposed changes there instead of starting a new effort?

merlimat · 2019-04-17T20:04:19Z

DelayedDeliveryTracker and DelayedMessageIndex are just two different implementations of one same things. If the current implementation of DelayedDeliveryTracker is acceptable, why the proposal of a time-partitioned DelayedMessageIndex is not acceptable? People can choose which implementation to use by configuring a configuration in the broker configuration.

That's a very good point. It would be good to have a DelayedDeliveryTracker as an interface and we can have different implementations.

That will help:

Accommodate different scenarios
Easily experiment with different implementation approaches

I'll update this PR to make the interface configurable.

Lastly, PIP-26 already presents changes regarding api, protocol, namespace policies and many other changes around this area. Shall we just pickup the proposed changes there instead of starting a new effort?

In PIP-26 the proposed API methods were:

// message to be delivered at the configured delay interval
producer.newMessage().delayAt(3L, TimeUnit.MINUTE).value("Hello Pulsar!").send();

// message to be delivered at the configure time.
producer.newMessage().scheduleAt(new Date(2018, 10, 31, 23, 00, 00))

In this PR I'm proposing:

producer.newMessage().deliverAfter(3, TimeUnit.MINUTE).value("hello").send();

producer.newMessage().deliverAt(timestamp).value("hello").send();

My reasons are:

delayAt() seems confusing because "at" in all timing APIs is used for absolute positioning
I'd rather keep the same prefix deliverAt() / deliverAfter to make it visually clear these are 2 alternative ways to configure the same feature.
Date vs timestamp. I have no strong opinion there since both are basically interchangeable (eg: new Date(timestamp) and date.getTime()). I was using a timestamp since that is what we're already using for publishTime and eventTime.

For the protobuf metadata change, in PIP-26 was :

// the message will be delayed at delivery by `delayed_ms` milliseconds.
    optional int64 delayed_ms 	= 18;

Though that won't support specifying absolute time of scheduling scheduling.

Instead, I propose to start with

// Mark the message to be delivered at or after the specified timestamp
optional uint64 deliver_at_time = 18;

Initially, with relative delays the client will just apply based on its current time. Once we have broker assigned timestamp (stored within the message metadata), then we could add a second field.
Alternatively, we could as well start with 2 fields (abs and relative) and have the broker do the math based on producer assigned publish time.

sijie · 2019-04-18T02:13:08Z

@merlimat : great! These comments around API and protocol changes are great if it can be done when PIP-26 was sent out. A DelayedDeliveryTracker interface would definitely help as well.

sijie · 2019-04-18T02:15:54Z

Also can you provide a namespace policy to enable and disable this feature per namespace as what PIP-26 proposed? It doesn't have to be in this PR, but an issue filed for tracking this is good.

pulsar-common/src/main/proto/PulsarApi.proto

pulsar-broker/src/main/java/org/apache/pulsar/broker/service/Consumer.java

.../java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java

merlimat · 2019-05-07T17:14:06Z

run java8 tests

.../java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java

pulsar-broker/src/main/java/org/apache/pulsar/broker/service/Consumer.java

merlimat · 2019-05-15T20:14:59Z

run java8 tests

merlimat · 2019-05-16T22:54:33Z

run java8 tests

pulsar-broker/src/main/java/org/apache/pulsar/broker/delayed/DelayedDeliveryTrackerLoader.java

rdhabalia · 2019-05-21T18:28:06Z

...ar-broker/src/main/java/org/apache/pulsar/broker/delayed/InMemoryDelayedDeliveryTracker.java

+    @Override
+    public Set<PositionImpl> getScheduledMessages(int maxMessages) {
+        int n = maxMessages;
+        Set<PositionImpl> positions = new TreeSet<>();


if we already know life cycle of PositionImpl then can we use PositionImplRecylce instead?

I wanted to keep it simple for now. We can iteratively improve and optimize.

rdhabalia · 2019-05-21T18:31:37Z

...ar-broker/src/main/java/org/apache/pulsar/broker/delayed/InMemoryDelayedDeliveryTracker.java

+        if (log.isDebugEnabled()) {
+            log.debug("[{}] Get scheduled messags - found {}", dispatcher.getName(), positions.size());
+        }
+        updateTimer();


why are we updating timer here?

We took items out of the queue, we need to adjust the timer for the next scheduled message

...-common/src/main/java/org/apache/pulsar/common/util/collections/TripleLongPriorityQueue.java

merlimat · 2019-05-23T16:30:53Z

@rdhabalia @ivankelly Please take another look.

merlimat · 2019-05-24T20:47:31Z

run java8 tests
run integration tests

sijie · 2019-05-27T13:22:55Z

@rdhabalia @ivankelly Please take another look, so that we can wrap up the features for 2.4.0.

.../java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java

ivankelly · 2019-05-28T13:40:09Z

...ar-broker/src/main/java/org/apache/pulsar/broker/delayed/InMemoryDelayedDeliveryTracker.java

+@Slf4j
+public class InMemoryDelayedDeliveryTracker implements DelayedDeliveryTracker, TimerTask {
+
+    private final TripleLongPriorityQueue priorityQueue = new TripleLongPriorityQueue();


This queue is unbounded. It could potentially allow someone to DOS the broker, by just allowing them to send a bunch of messages with a delivery date far in the future. We should degrade gracefully from this, though I'm not sure what the nicest behaviour would be for the user. Maybe if the queue is full, force delivery from the head of the queue or something.

Yes, the idea was to start with simple implementation and iterate from that, based on the observed issues/weaknesses.

Also, there are 2 ways to address that:

The feature can be disabled on server side

The tracker implementation is pluggable, so one could either expand the current one or provide an alternative implementation

ok, I'll +1 this one, but this DOS should be dealt with asap.

The cap on the mem size will need to be applied per-broker though rather than per-topic

merlimat · 2019-05-29T01:47:44Z

run java8 tests

ivankelly · 2019-05-29T13:14:18Z

run java8 tests

Geal · 2021-07-20T09:05:00Z

@merlimat why was the uint64 timestamp replaced with a int64 in 52832fe ? I can't imagine a use case for negative deliver_at_time timestamps

Delayed message delivery implementation

3290126

merlimat added the type/feature The PR added a new feature or issue requested a new feature label Apr 17, 2019

merlimat added this to the 2.4.0 milestone Apr 17, 2019

merlimat requested review from ivankelly, sijie, rdhabalia, jerrypeng, jiazhai, codelipenghui, nkurihar and massakam April 17, 2019 00:23

merlimat self-assigned this Apr 17, 2019

sijie requested changes Apr 17, 2019

View reviewed changes

Fixed compilation

b54c438

sijie reviewed Apr 18, 2019

View reviewed changes

pulsar-common/src/main/proto/PulsarApi.proto Outdated Show resolved Hide resolved

lovelle reviewed Apr 18, 2019

View reviewed changes

pulsar-broker/src/main/java/org/apache/pulsar/broker/service/Consumer.java Outdated Show resolved Hide resolved

lovelle reviewed Apr 18, 2019

View reviewed changes

.../java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java Show resolved Hide resolved

lovelle reviewed Apr 18, 2019

View reviewed changes

.../java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java Show resolved Hide resolved

Allow to configure the delayed tracker implementation

d5e1ebc

merlimat mentioned this pull request Apr 18, 2019

Allow to enable/disable delayed deliver on a per-namespace base #4080

Closed

Use int64 for timestamp

52832fe

ivankelly reviewed May 9, 2019

View reviewed changes

.../java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java Outdated Show resolved Hide resolved

pulsar-broker/src/main/java/org/apache/pulsar/broker/service/Consumer.java Outdated Show resolved Hide resolved

Merge remote-tracking branch 'apache/master' into delayed-delivery

e99e199

sijie mentioned this pull request May 19, 2019

Deferred messages for consumers #3155

Closed

merlimat added 2 commits May 20, 2019 16:41

Fixed triggering writePromise when last entry was nullified

640b4bd

Merge remote-tracking branch 'apache/master' into delayed-delivery

3f35fb2

merlimat mentioned this pull request May 21, 2019

Moved entries filtering from consumer to dispatcher #4329

Merged

merlimat added 2 commits May 21, 2019 10:13

Moved entries filtering from consumer to dispatcher

dcfa102

Merge branch 'refactor-filtering' into delayed-delivery

7159782

rdhabalia reviewed May 21, 2019

View reviewed changes

...-common/src/main/java/org/apache/pulsar/common/util/collections/TripleLongPriorityQueue.java Show resolved Hide resolved

merlimat added 2 commits May 22, 2019 11:55

Merge remote-tracking branch 'apache/master' into delayed-delivery

b3d99b7

Added Javadocs

e36a44c

Merge remote-tracking branch 'apache/master' into delayed-delivery

7cad297

ivankelly reviewed May 28, 2019

View reviewed changes

merlimat added 2 commits May 28, 2019 14:20

Merge remote-tracking branch 'apache/master' into delayed-delivery

17578a4

Reduced synchronized scope to minimum

b10712a

ivankelly approved these changes May 29, 2019

View reviewed changes

merlimat merged commit ba24d73 into apache:master May 29, 2019

merlimat deleted the delayed-delivery branch May 29, 2019 17:54

sijie mentioned this pull request Dec 27, 2019

ISSUE-4080: Allow to enable/disable delayed deliver on a per-namespace base streamnative/pulsar-archived#397

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delayed message delivery implementation #4062

Delayed message delivery implementation #4062

merlimat commented Apr 17, 2019 •

edited by sijie

Loading

sijie commented Apr 17, 2019

sijie commented Apr 17, 2019

sijie left a comment

merlimat commented Apr 17, 2019

joefk commented Apr 17, 2019 •

edited

Loading

lovelle commented Apr 17, 2019

sijie commented Apr 17, 2019 •

edited

Loading

merlimat commented Apr 17, 2019

sijie commented Apr 18, 2019

sijie commented Apr 18, 2019

merlimat commented May 7, 2019

merlimat commented May 15, 2019

merlimat commented May 16, 2019

rdhabalia May 21, 2019

merlimat May 21, 2019

rdhabalia May 21, 2019

merlimat May 21, 2019

merlimat commented May 23, 2019

merlimat commented May 24, 2019

sijie commented May 27, 2019

ivankelly May 28, 2019

merlimat May 28, 2019

ivankelly May 29, 2019

merlimat May 29, 2019

merlimat commented May 29, 2019

ivankelly commented May 29, 2019

Geal commented Jul 20, 2021

Delayed message delivery implementation #4062

Delayed message delivery implementation #4062

Conversation

merlimat commented Apr 17, 2019 • edited by sijie Loading

Motivation

Implementation

Possible improvements

sijie commented Apr 17, 2019

sijie commented Apr 17, 2019

sijie left a comment

Choose a reason for hiding this comment

merlimat commented Apr 17, 2019

joefk commented Apr 17, 2019 • edited Loading

lovelle commented Apr 17, 2019

sijie commented Apr 17, 2019 • edited Loading

merlimat commented Apr 17, 2019

sijie commented Apr 18, 2019

sijie commented Apr 18, 2019

merlimat commented May 7, 2019

merlimat commented May 15, 2019

merlimat commented May 16, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merlimat commented May 23, 2019

merlimat commented May 24, 2019

sijie commented May 27, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merlimat commented May 29, 2019

ivankelly commented May 29, 2019

Geal commented Jul 20, 2021

merlimat commented Apr 17, 2019 •

edited by sijie

Loading

joefk commented Apr 17, 2019 •

edited

Loading

sijie commented Apr 17, 2019 •

edited

Loading