[improve][broker] Optimize high CPU usage when consuming from topics with ongoing txn #23189

coderzc · 2024-08-17T11:25:33Z

Motivation

We found the CPU of the broker busy with calling ManagedLedgerImpl.internalReadFromLedger and broker checked the readPosition > maxPosition and then triggered the readMoreEntries call again leading to the broker looping to call readMoreEntries. But in ManagedCursorImpl.asyncReadEntriesWithSkipOrWait we have checked no more data to read via hasMoreEntries(). I think this case may be caused by maxReadPosition < lastConfirmedPosition when the topic exists ongoing Txn. So I think maxPosition <= readPosition we should not read entries immediately, instead, we delay calling read entries.

Modifications

If maxPosition < readPosition then delayed trigger readEntries.

Test Code:

    @Test
    public void testSlowTxn() throws Exception {
        String topic = NAMESPACE1 + "/testSlowTxn";
        @Cleanup
        ProducerImpl<byte[]> producer = (ProducerImpl<byte[]>) pulsarClient.newProducer()
                .topic(topic)
                .sendTimeout(1, TimeUnit.SECONDS)
                .create();

        @Cleanup
        Consumer<byte[]> consumer = pulsarClient.newConsumer()
                .topic(topic)
                .subscriptionName("test")
                .subscriptionType(SubscriptionType.Shared)
                .subscribe();

        Transaction transaction = pulsarClient.newTransaction().withTransactionTimeout(10, TimeUnit.MINUTES)
                .build().get();

        producer.newMessage(transaction).value("Hello Pulsar!".getBytes()).send();

        Thread.sleep(10 * 60 * 1000);

        transaction.commit().get();
        producer.close();
        admin.topics().delete(topic, true);
    }

CPU usage before applying this change:

flamegraph: https://drive.google.com/file/d/1nNb4MOdbZB7mO4fWitts2UpjzutdT22O/view?usp=sharing

CPU usage after applying this change:

flamegraph: https://drive.google.com/file/d/1AndMJuSMXhOImf3T0hg_E7YeCyslTaNI/view?usp=sharing

Verifying this change

Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end deployment with large payloads (10MB)
Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

Documentation

doc
doc-required
doc-not-needed
doc-complete

Matching PR in forked repository

PR in forked repository:

thetumbled · 2024-08-19T03:26:18Z

Same problem as #22944?

coderzc · 2024-08-19T03:50:48Z

Same problem as #22944?

Looks like yes, I will review #22944

lhotari

LGTM Good catch @coderzc!

coderzc · 2024-08-20T15:32:15Z

This is a quick fix, it is only effective for managedLedgerNewEntriesCheckDelayInMillis > 0 we can merge pr first and we can continue to review #22944

…with ongoing txn (#23189) (cherry picked from commit 94e1341)

…with ongoing txn (apache#23189) (cherry picked from commit 94e1341) (cherry picked from commit b7ffa73)

…with ongoing txn (apache#23189)

Optimize high CPU usage when consuming from topics with ongoing txn

c27a8af

coderzc marked this pull request as draft August 17, 2024 11:25

github-actions bot added the doc-not-needed Your PR changes do not impact docs label Aug 17, 2024

coderzc marked this pull request as ready for review August 19, 2024 02:26

coderzc requested review from lhotari and shibd August 19, 2024 02:38

freeznet approved these changes Aug 19, 2024

View reviewed changes

lhotari approved these changes Aug 19, 2024

View reviewed changes

liangyepianzhou approved these changes Aug 20, 2024

View reviewed changes

fix code

46655ca

coderzc merged commit 94e1341 into apache:master Aug 20, 2024
54 of 57 checks passed

coderzc added type/bug The PR fixed a bug or issue reported a bug area/broker release/3.0.7 release/3.3.2 labels Aug 20, 2024

coderzc added a commit that referenced this pull request Aug 21, 2024

[improve][broker] Optimize high CPU usage when consuming from topics …

b7ffa73

…with ongoing txn (#23189) (cherry picked from commit 94e1341)

coderzc added a commit that referenced this pull request Aug 21, 2024

[improve][broker] Optimize high CPU usage when consuming from topics …

4289cfc

…with ongoing txn (#23189) (cherry picked from commit 94e1341)

coderzc added cherry-picked/branch-3.0 cherry-picked/branch-3.3 labels Aug 21, 2024

nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Aug 22, 2024

[improve][broker] Optimize high CPU usage when consuming from topics …

b227aca

…with ongoing txn (apache#23189) (cherry picked from commit 94e1341) (cherry picked from commit b7ffa73)

srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Aug 23, 2024

[improve][broker] Optimize high CPU usage when consuming from topics …

c29f36a

…with ongoing txn (apache#23189) (cherry picked from commit 94e1341) (cherry picked from commit b7ffa73)

grssam pushed a commit to grssam/pulsar that referenced this pull request Sep 4, 2024

[improve][broker] Optimize high CPU usage when consuming from topics …

0f6f10b

…with ongoing txn (apache#23189)

lhotari added this to the 4.0.0 milestone Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[improve][broker] Optimize high CPU usage when consuming from topics with ongoing txn #23189

[improve][broker] Optimize high CPU usage when consuming from topics with ongoing txn #23189

coderzc commented Aug 17, 2024 •

edited

Loading

thetumbled commented Aug 19, 2024

coderzc commented Aug 19, 2024

lhotari left a comment

coderzc commented Aug 20, 2024

[improve][broker] Optimize high CPU usage when consuming from topics with ongoing txn #23189

[improve][broker] Optimize high CPU usage when consuming from topics with ongoing txn #23189

Conversation

coderzc commented Aug 17, 2024 • edited Loading

Motivation

Modifications

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Matching PR in forked repository

thetumbled commented Aug 19, 2024

coderzc commented Aug 19, 2024

lhotari left a comment

Choose a reason for hiding this comment

coderzc commented Aug 20, 2024

coderzc commented Aug 17, 2024 •

edited

Loading