Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PIP-74] Support auto scaled consumer receiver queue #14494

Merged
merged 4 commits into from
Apr 18, 2022

Conversation

Jason918
Copy link
Contributor

@Jason918 Jason918 commented Feb 28, 2022

Motivation

  1. Pick a proper receiver queue size is not an easy thing. If the value is too small, it impact the throughput, and if the value is too large, it consumes too many memory. With default set up, the queue size is 1000, and the max message size is 5MB, this means maximum of 5GB memory occupation.

  2. This is part of the work for PIP 74. We need auto scale currentReceiverQueue to control client memory.

Modifications

Add optional autoScaledReceiverQueueSizeEnabled for consumer client.

Previous receiverQueueSize is the max value of this auto scaled queue size.

Every time the client will try to double the size of currentReceiverQueue if it limits message throughput. Currently, it's determined by the following two conditions in exact order:
A) Current receiver queue (ConsumerBase#incomingMessages) is full after we put a message into it. (it's marked by scaleReceiverQueueHint as true).
B) Application wants process more messages but the receiver queue is empty. (expectMoreIncomingMessages is called in this PR).

The queue size won't grow if we got new messages during A and B. So if assume current receiver queue size is 10, and the timeline would be like

  1. Consumer send flow command to client, and client receive 10 messages, so scaleReceiverQueueHint is marked as true.
  2. Application calls receive() repeatedly and processed all 10 messages. And in the meanwhile no new message is sent to client[1].
  3. Receiver queue size will be doubled to 20.
  4. ...

NOTE:
Condition A here is slightly different from original design in PIP-74 described as "there are messages pending to be sent for this consumer" (let's refer it as Condition X). This is proposed with these consideration:
a) We can assume that when receiver queue is full, the chance of broker have no more messages is insignificant in practice.
b) If we accept a), then Condition A implies Condition X, as new messages are pending because local queue size is full. But it's NOT vice versa.
c) Once fact we should accept is that we don't need expand queue size every time it limits the throughput as some slight and occasional delay is acceptable and won't affect overall throughput. And by replacing Condition A with Condition X, we can reduce the sensitivity of the expansion.

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

(example:)

  • org.apache.pulsar.client.impl.AutoScaledReceiverQueueSizeTest

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API: (yes, add a config for consumer builder)
  • The schema: (no)
  • The default values of configurations: (no)
  • The wire protocol: (no)
  • The rest endpoints: (no)
  • The admin cli options: (no)
  • Anything that affects deployment: (no)

Documentation

Check the box below and label this PR (if you have committer privilege).

Need to update docs?

  • no-need-doc

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 28, 2022
@Jason918
Copy link
Contributor Author

Jason918 commented Mar 1, 2022

/pulsarbot run-failure-checks

@codelipenghui codelipenghui added this to the 2.11.0 milestone Mar 1, 2022
@Jason918
Copy link
Contributor Author

Jason918 commented Mar 19, 2022

Here is a test with bin/pulsar-perf on local standalone server to confirm the effect of this PR.

Non-partitioned topic consumer

bin/pulsar-perf consume -aq -q 1000000 persistent://public/default/test
bin/pulsar-perf produce -r $RATE -s 128 -bm 0 -time 60 persistent://public/default/test
RATE Final receiver queue size
1 2
10 2
100 4
1000 32
10000 512

3-partitioned topic consumer

bin/pulsar-perf consume -aq -q 1000000 persistent://public/default/multi-partitions
bin/pulsar-perf produce -r $RATE -s 128 -bm 0 -time 60 -np 3 persistent://public/default/multi-partitions
RATE Final receiver queue size (MultiTopicConsumer) Sub-consumers receiver queue size
1 3 1,1,1
10 3 1,1,1
100 6 2,1,1
1000 12 16,32,16
5000 48 128,128,128
10000 96 1024,512,512
20000 192 1024,512,512
40000 384 1024,1024,1024

@Jason918
Copy link
Contributor Author

/pulsarbot run-failure-checks

@Jason918
Copy link
Contributor Author

Jason918 commented Apr 1, 2022

/pulsarbot run-failure-checks

@Jason918 Jason918 closed this Apr 1, 2022
@Jason918 Jason918 reopened this Apr 1, 2022
@Jason918
Copy link
Contributor Author

Jason918 commented Apr 1, 2022

/pulsarbot run-failure-checks

@Jason918 Jason918 closed this Apr 2, 2022
@Jason918 Jason918 reopened this Apr 2, 2022
@Jason918
Copy link
Contributor Author

Jason918 commented Apr 4, 2022

/pulsarbot run-failure-checks

1 similar comment
@Jason918
Copy link
Contributor Author

Jason918 commented Apr 7, 2022

/pulsarbot run-failure-checks

@Jason918 Jason918 requested a review from merlimat April 7, 2022 03:08
@Jason918
Copy link
Contributor Author

Jason918 commented Apr 7, 2022

@Jason918
Copy link
Contributor Author

Jason918 commented Apr 9, 2022

/pulsarbot run-failure-checks

@HQebupt
Copy link
Contributor

HQebupt commented Apr 12, 2022

👍

@Jason918 Jason918 requested a review from gaozhangmin April 12, 2022 02:12
Copy link
Member

@StevenLuMT StevenLuMT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job

@Jason918
Copy link
Contributor Author

@gaozhangmin
Copy link
Contributor

LGTM

@Jason918
Copy link
Contributor Author

/pulsarbot run-failure-checks

Copy link
Contributor

@lordcheng10 lordcheng10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codelipenghui codelipenghui merged commit bb0e0f2 into apache:master Apr 18, 2022
aparajita89 pushed a commit to aparajita89/pulsar that referenced this pull request Apr 18, 2022
* Add autoScaledReceiverQueueSize

* Add autoScaledReceiverQueueSize for PerformanceConsumer

* remove memory limit code

* fix typo
Nicklee007 pushed a commit to Nicklee007/pulsar that referenced this pull request Apr 20, 2022
* Add autoScaledReceiverQueueSize

* Add autoScaledReceiverQueueSize for PerformanceConsumer

* remove memory limit code

* fix typo
@Jason918 Jason918 deleted the pip-74-consumer branch October 20, 2022 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants