[fix][broker] Optimize /metrics, fix unbounded request queue issue and fix race conditions in metricsBufferResponse mode #22494

lhotari · 2024-04-12T15:18:23Z

Fixes #22477

Motivation

There are multiple problems in the /metrics endpoint:

requests are handled one-by-one and added into a queue
- when requests have timed out, they get processed regardless. time outs are detected in the later phases of the processing. Instead, timed out requests should be short-circuited in the beginning of the processing.
processing is single threaded and therefore the throughput is low
the metricsBufferResponse mode added in Improve /metrics endpoint performance #14453 improves things, but it contains some race conditions where a buffer can get released when it is in use.

Modifications

fix the request timeout issue
enable multithreading. currently fixed to 4 threads since higher concurrency requires more memory and leads to OOM issues.
- by default, combine concurrent requests to reuse the same result. this improves throughput even when metricsBufferResponse mode isn't enabled
remove the previous metricsBufferResponse solution since the TimeWindow and WindowWrap classes arean't needed at all. The concurrent request combining solution can also be used for metricsBufferResponse solution.
optimize some details of stats generation which were allocating a lot of objects.

Documentation

doc
doc-required
doc-not-needed
doc-complete

lhotari · 2024-04-12T16:02:48Z

Performance test results with the changes.

Reproducing the results:

gh pr checkout 22494
mvn -Pcore-modules,-main -T 1C clean install -DskipTests -Dspotbugs.skip=true
rm -rf data
PULSAR_MEM="-Xms2g -Xmx4g -XX:MaxDirectMemorySize=6g" PULSAR_GC="-XX:+UseG1GC -XX:+PerfDisableSharedMem -XX:+AlwaysPreTouch" PULSAR_EXTRA_OPTS="-Djute.maxbuffer=20000000" PULSAR_STANDALONE_USE_ZOOKEEPER=1 bin/pulsar standalone -nss -nfw 2>&1 | tee standalone.log

git clone https://github.com/lhotari/pulsar-playground
cd pulsar-playground
./gradlew shadowJar
# create the topics
java -cp build/libs/pulsar-playground-all.jar com.github.lhotari.pulsar.playground.TestScenarioCreateLongNamedTopics

git clone https://github.com/lhotari/pulsar-playground
cd pulsar-playground/experiments/metrics-performance

❯ k6 run load_test.js

          /\      |‾‾| /‾‾/   /‾‾/
     /\  /  \     |  |/  /   /  /
    /  \/    \    |     (   /   ‾‾\
   /          \   |  |\  \ |  (‾)  |
  / __________ \  |__| \__\ \_____/ .io

     execution: local
        script: load_test.js
        output: -

     scenarios: (100.00%) 1 scenario, 100 max VUs, 10m30s max duration (incl. graceful stop):
              * default: 10000 iterations shared among 100 VUs (maxDuration: 10m0s, gracefulStop: 30s)


     data_received..................: 3.8 TB 8.1 GB/s
     data_sent......................: 880 kB 1.9 kB/s
     http_req_blocked...............: avg=35.04µs  min=1µs      med=3µs     max=3.92ms   p(90)=6µs      p(95)=8µs
     http_req_connecting............: avg=25.17µs  min=0s       med=0s      max=3.02ms   p(90)=0s       p(95)=0s
     http_req_duration..............: avg=4.7s     min=457.09ms med=4.68s   max=9.71s    p(90)=8.15s    p(95)=8.58s
       { expected_response:true }...: avg=4.7s     min=457.09ms med=4.68s   max=9.71s    p(90)=8.15s    p(95)=8.58s
     http_req_failed................: 0.00%  ✓ 0         ✗ 10000
     http_req_receiving.............: avg=185.35ms min=86.9ms   med=183.9ms max=469.76ms p(90)=214.08ms p(95)=224.08ms
     http_req_sending...............: avg=58.44µs  min=3µs      med=15µs    max=41.26ms  p(90)=28µs     p(95)=38µs
     http_req_tls_handshaking.......: avg=0s       min=0s       med=0s      max=0s       p(90)=0s       p(95)=0s
     http_req_waiting...............: avg=4.51s    min=302.02ms med=4.5s    max=9.56s    p(90)=7.96s    p(95)=8.4s
     http_reqs......................: 10000  21.179615/s
     iteration_duration.............: avg=4.7s     min=457.16ms med=4.69s   max=9.71s    p(90)=8.15s    p(95)=8.58s
     iterations.....................: 10000  21.179615/s
     vus............................: 5      min=5       max=100
     vus_max........................: 100    min=100     max=100


running (07m52.2s), 000/100 VUs, 10000 complete and 0 interrupted iterations
default ✓ [======================================] 100 VUs  07m52.2s/10m0s  10000/10000 shared iters

lhotari · 2024-04-12T16:04:03Z

One interesting detail is that in the load test, the system couldn't keep up with the load without making the optimization to direct buffer allocation in commit eb53342.

…ict in pulsar-proxy tests

pulsar-common/src/main/java/org/apache/pulsar/common/util/SimpleTextOutputStream.java

lhotari · 2024-04-12T19:37:11Z

Most recent test run without turning on the caching:

❯ k6 run load_test.js

          /\      |‾‾| /‾‾/   /‾‾/
     /\  /  \     |  |/  /   /  /
    /  \/    \    |     (   /   ‾‾\
   /          \   |  |\  \ |  (‾)  |
  / __________ \  |__| \__\ \_____/ .io

     execution: local
        script: load_test.js
        output: -

     scenarios: (100.00%) 1 scenario, 100 max VUs, 10m30s max duration (incl. graceful stop):
              * default: 10000 iterations shared among 100 VUs (maxDuration: 10m0s, gracefulStop: 30s)


     data_received..................: 3.8 TB 8.5 GB/s
     data_sent......................: 880 kB 2.0 kB/s
     http_req_blocked...............: avg=42.58µs  min=1µs      med=3µs      max=4.93ms   p(90)=5µs      p(95)=7µs
     http_req_connecting............: avg=33.7µs   min=0s       med=0s       max=4ms      p(90)=0s       p(95)=0s
     http_req_duration..............: avg=4.46s    min=298.14ms med=4.44s    max=8.7s     p(90)=7.8s     p(95)=8.2s
       { expected_response:true }...: avg=4.46s    min=298.14ms med=4.44s    max=8.7s     p(90)=7.8s     p(95)=8.2s
     http_req_failed................: 0.00%  ✓ 0         ✗ 10000
     http_req_receiving.............: avg=178.21ms min=75.75ms  med=176.97ms max=359.87ms p(90)=204.48ms p(95)=213.44ms
     http_req_sending...............: avg=37.42µs  min=4µs      med=13µs     max=25.6ms   p(90)=25µs     p(95)=33µs
     http_req_tls_handshaking.......: avg=0s       min=0s       med=0s       max=0s       p(90)=0s       p(95)=0s
     http_req_waiting...............: avg=4.29s    min=143.93ms med=4.26s    max=8.56s    p(90)=7.62s    p(95)=8.03s
     http_reqs......................: 10000  22.281335/s
     iteration_duration.............: avg=4.46s    min=298.19ms med=4.44s    max=8.7s     p(90)=7.8s     p(95)=8.2s
     iterations.....................: 10000  22.281335/s
     vus............................: 22     min=22      max=100
     vus_max........................: 100    min=100     max=100


running (07m28.8s), 000/100 VUs, 10000 complete and 0 interrupted iterations
default ✓ [======================================] 100 VUs  07m28.8s/10m0s  10000/10000 shared iters

lhotari · 2024-04-12T19:54:56Z

I did allocation profiling with Async Profiler. Used similar commands as in #22494 (comment) , but with async profiler activated with OPTS.

OPTS="-agentpath:$HOME/tools/async-profiler/lib/libasyncProfiler.so=start,event=cpu,alloc=2m,lock=10ms,file=$PWD/profile.jfr" PULSAR_MEM="-Xms2g -Xmx4g -XX:MaxDirectMemorySize=6g" PULSAR_GC="-XX:+UseG1GC -XX:+PerfDisableSharedMem -XX:+AlwaysPreTouch" PULSAR_EXTRA_OPTS="-Djute.maxbuffer=20000000" PULSAR_STANDALONE_USE_ZOOKEEPER=1 bin/pulsar standalone -nss -nfw 2>&1 | tee standalone.log

With OPTS="-agentpath:$HOME/tools/async-profiler/lib/libasyncProfiler.so=start,event=cpu,alloc=2m,lock=10ms,file=$PWD/profile.jfr", it's possible to do CPU, allocation and lock profiling all at once. I run async-profiler on a Linux box to get best accuracy.

After profiling, I then use this shell script function to generate multiple flamegraph htmls out of the JFR file:
https://github.com/lhotari/pulsar-contributor-toolbox/blob/c150c3d9afc23d4865c2e3283c087e1c1261b4ee/functions/pulsar-contributor-toolbox-functions.sh#L1438-L1458

…le transferring

dao-jun · 2024-04-13T11:00:59Z

One interesting detail is that in the load test, the system couldn't keep up with the load without making the optimization to direct buffer allocation in commit eb53342.

I guess it is related to CompositeByteBuf#consolidateIfNeeded, once the number of components exceeds MAX_COMPONENT, it will combine all the components to one component, memory copy happens.

lhotari · 2024-04-13T11:19:55Z

One interesting detail is that in the load test, the system couldn't keep up with the load without making the optimization to direct buffer allocation in commit eb53342.

I guess it is related to CompositeByteBuf#consolidateIfNeeded, once the number of components exceeds MAX_COMPONENT, it will combine all the components to one component, memory copy happens.

Memory copying isn't the biggest problem. A bigger problem is direct memory OOM that seems to happen when the memory space is so fragmented that there isn't free space for the allocation requests. The Netty pool has a maximum chunk size (8MB default) and all larger allocations are huge allocations that aren't pooled. That's why I think the added logic helps since it pre-allocates up to 8MB chunks.
Memory copying will happen, but that seems to be fine.

eolivelli

Great work

…d fix race conditions in metricsBufferResponse mode (apache#22494) (cherry picked from commit 7009071)

…d fix race conditions in metricsBufferResponse mode (#22494) (cherry picked from commit 7009071) # Conflicts: # pulsar-broker/src/test/java/org/apache/pulsar/broker/service/persistent/BucketDelayedDeliveryTest.java # pulsar-broker/src/test/java/org/apache/pulsar/broker/service/persistent/PersistentTopicTest.java # pulsar-broker/src/test/java/org/apache/pulsar/broker/service/schema/SchemaServiceTest.java # pulsar-broker/src/test/java/org/apache/pulsar/broker/stats/PrometheusMetricsTest.java # pulsar-broker/src/test/java/org/apache/pulsar/broker/transaction/buffer/TransactionBufferClientTest.java # pulsar-broker/src/test/java/org/apache/pulsar/broker/transaction/pendingack/PendingAckPersistentTest.java # pulsar-broker/src/test/java/org/apache/pulsar/broker/web/WebServiceTest.java

…d fix race conditions in metricsBufferResponse mode (apache#22494) (cherry picked from commit 7009071)

lhotari · 2024-04-15T11:19:25Z

With 'Accept-Encoding: gzip header in a k6 test, the performance isn't great, however the system is able to stay available and about 25% of the requests timed out under load.
It would be able to optimize the solution further to cache the Gzipped response, but that seems premature optimization.

…d fix race conditions in metricsBufferResponse mode (apache#22494) (cherry picked from commit 7009071) (cherry picked from commit 5f9d7c5)

asafm · 2024-06-01T17:21:33Z

...common/src/main/java/org/apache/pulsar/broker/stats/prometheus/PrometheusMetricsServlet.java

-                            + "the connection due to a timeout ({} ms elapsed): {}", time, e + "");
-                } else {
-                    log.error("Failed to generate prometheus stats, {} ms elapsed", time, e);
+        // set hard timeout to 2 * timeout


Seeing this makes me happy we're moving to a consolidated maintained exporters in OTel :)

…d fix race conditions in metricsBufferResponse mode (apache#22494) (cherry picked from commit 7009071) (cherry picked from commit 5f9d7c5)

[fix][broker] Fix unbounded request queue issue with /metrics endpoint

ef31b84

lhotari added type/bug The PR fixed a bug or issue reported a bug ready-to-test release/3.1.4 release/3.2.3 release/3.0.5 labels Apr 12, 2024

lhotari added this to the 3.3.0 milestone Apr 12, 2024

lhotari requested review from merlimat, hangc0276, Technoboy-, cbornet and poorbarcode April 12, 2024 15:18

lhotari self-assigned this Apr 12, 2024

github-actions bot added the doc-not-needed Your PR changes do not impact docs label Apr 12, 2024

lhotari requested review from eolivelli and codelipenghui April 12, 2024 15:23

Pre-allocate a composite buffer

eb53342

lhotari added 2 commits April 12, 2024 19:39

Move Prometheus metrics initialization to separate class to fix confl…

1292a28

…ict in pulsar-proxy tests

Add another workaround for the issue in tests

2aec7e7

lhotari commented Apr 12, 2024

View reviewed changes

pulsar-common/src/main/java/org/apache/pulsar/common/util/SimpleTextOutputStream.java Show resolved Hide resolved

lhotari added 5 commits April 12, 2024 20:33

Fix issue

ff03caf

Add optimization for write(char c)

bf1e994

Replace "brk_" with "pulsar_" without allocations

fa0eda7

Optimize label value replacement

81a80bd

Eliminate allocations in TopicStats

81c21de

lhotari requested a review from michaeljmarshall April 12, 2024 20:35

lhotari requested a review from asafm April 13, 2024 10:36

lhotari added 2 commits April 13, 2024 13:40

Handle case when service is closing

adbc021

Implement a soft and hard timeout so that request doesn't timeout whi…

02db7a1

…le transferring

lhotari added 2 commits April 13, 2024 14:34

Ignore metricsServletTimeoutMs unless it is > 0

e53c91b

Support metricsServletTimeoutMs in proxy

b99fd28

eolivelli approved these changes Apr 13, 2024

View reviewed changes

merlimat approved these changes Apr 13, 2024

View reviewed changes

merlimat merged commit 7009071 into apache:master Apr 13, 2024
49 of 50 checks passed

lhotari added a commit to lhotari/pulsar that referenced this pull request Apr 15, 2024

[fix][broker] Optimize /metrics, fix unbounded request queue issue an…

5f9d7c5

…d fix race conditions in metricsBufferResponse mode (apache#22494) (cherry picked from commit 7009071)

lhotari added a commit to lhotari/pulsar that referenced this pull request Apr 15, 2024

[fix][broker] Optimize /metrics, fix unbounded request queue issue an…

bfa4d75

…d fix race conditions in metricsBufferResponse mode (apache#22494) (cherry picked from commit 7009071)

lhotari added cherry-picked/branch-3.0 cherry-picked/branch-3.1 cherry-picked/branch-3.2 labels Apr 15, 2024

lhotari added a commit to lhotari/pulsar that referenced this pull request Apr 15, 2024

[fix][broker] Optimize /metrics, fix unbounded request queue issue an…

734c3f8

…d fix race conditions in metricsBufferResponse mode (apache#22494) (cherry picked from commit 7009071)

lhotari added release/2.11.5 cherry-picked/branch-2.11 labels Apr 15, 2024

lhotari mentioned this pull request Apr 17, 2024

[improve][broker] Optimize gzip compression for /metrics endpoint by sharing/caching compressed result #22521

Merged

4 tasks

lhotari mentioned this pull request Apr 23, 2024

[improve][meta] Log a warning when ZK batch fails with connectionloss #22566

Merged

4 tasks

asafm reviewed Jun 1, 2024

View reviewed changes

dlg99 mentioned this pull request Aug 26, 2024

[fix] StatsOutputStream: add string write function (#308) #23227

Merged

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix][broker] Optimize /metrics, fix unbounded request queue issue and fix race conditions in metricsBufferResponse mode #22494

[fix][broker] Optimize /metrics, fix unbounded request queue issue and fix race conditions in metricsBufferResponse mode #22494

lhotari commented Apr 12, 2024

lhotari commented Apr 12, 2024

lhotari commented Apr 12, 2024

lhotari commented Apr 12, 2024

lhotari commented Apr 12, 2024

dao-jun commented Apr 13, 2024

lhotari commented Apr 13, 2024

eolivelli left a comment

lhotari commented Apr 15, 2024

asafm Jun 1, 2024

[fix][broker] Optimize /metrics, fix unbounded request queue issue and fix race conditions in metricsBufferResponse mode #22494

[fix][broker] Optimize /metrics, fix unbounded request queue issue and fix race conditions in metricsBufferResponse mode #22494

Conversation

lhotari commented Apr 12, 2024

Motivation

Modifications

Documentation

lhotari commented Apr 12, 2024

lhotari commented Apr 12, 2024

lhotari commented Apr 12, 2024

lhotari commented Apr 12, 2024

dao-jun commented Apr 13, 2024

lhotari commented Apr 13, 2024

eolivelli left a comment

Choose a reason for hiding this comment

lhotari commented Apr 15, 2024

asafm Jun 1, 2024

Choose a reason for hiding this comment