Flaky-test: C++ ClientTest.testReferenceCount #14848

BewareMyPower · 2022-03-24T10:36:12Z

ClientTest.testReferenceCount is flaky. It fails sporadically.

2022-03-23 08:18:48.488 INFO  [140121845906176] ClientConnection:182 | [<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
2022-03-23 08:18:48.488 INFO  [140121845906176] ConnectionPool:96 | Created connection for pulsar://localhost:6650
2022-03-23 08:18:48.492 INFO  [140121726842624] ClientConnection:368 | [127.0.0.1:32784 -> 127.0.0.1:6650] Connected to broker
2022-03-23 08:18:48.494 INFO  [140121726842624] HandlerBase:64 | [persistent://public/default/client-test-reference-count-1648023528, ] Getting connection from pool
2022-03-23 08:18:48.504 INFO  [140121726842624] ProducerImpl:188 | [persistent://public/default/client-test-reference-count-1648023528, ] Created producer on broker [127.0.0.1:32784 -> 127.0.0.1:6650]
/pulsar/pulsar-client-cpp/tests/ClientTest.cc:[192](https://github.com/apache/pulsar/runs/5656553451?check_suite_focus=true#step:11:192): Failure
Value of: 1
Expected: producers[0].use_count()
Which is: 7

2022-03-23 08:18:48.525 INFO  [140385848068864] ClientConnection:182 | [<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
2022-03-23 08:18:48.525 INFO  [140385848068864] ConnectionPool:96 | Created connection for pulsar://localhost:6650
2022-03-23 08:18:48.527 INFO  [140385729005312] ClientConnection:368 | [127.0.0.1:32786 -> 127.0.0.1:6650] Connected to broker
2022-03-23 08:18:48.530 INFO  [140385729005312] HandlerBase:64 | [persistent://public/default/client-test-reference-count-1648023528, ] Getting connection from pool
2022-03-23 08:18:48.536 INFO  [140385729005312] ProducerImpl:188 | [persistent://public/default/client-test-reference-count-1648023528, ] Created producer on broker [127.0.0.1:32786 -> 127.0.0.1:6650]
/pulsar/pulsar-client-cpp/tests/ClientTest.cc:192: Failure
Value of: 1
Expected: producers[0].use_count()
Which is: 7

The text was updated successfully, but these errors were encountered:

Fixes apache#14848 Fixes apache#14719 ### Motivation apache#7793 introduced a `testReferenceLeak` to avoid cyclic referenece of the reader. However, it adds a unused field `readerImplWeakPtr_` only for tests. The access to this field is not thread safe that the write operation happens in `handleConsumerCreated` while the read operation can happen anywhere via the getter. So there is a little chance that `readerPtr` in `testReferenceLeak` doesn't point to the right object. In addition, we should only guarantee the reference count becomes 0 after the producer, consumer or reader goes out of its scope. apache#14797 adds a `ClientTest.testReferenceCount` but it's also flaky. It's caused by the shared pointer of `ProducerImpl` is published to another thread via `shared_from_this()` but the test has a strong expectation that the reference count is exactly 1. ### Modifications - Remove `readerImplWeakPtr_` from `ReaderImpl` and get the weak pointer from `Reader` directly by adding a method to `PulsarFriend`. - Add the check of reader's reference count to `testReferenceCount` and remove the redundant `testReferenceLeak`. - Instead of asserting the reference count of producer/consumer/reader is 1, just assume the it's greater than 0.

Fixes #14848 Fixes #14719 ### Motivation #7793 introduced a `testReferenceLeak` to avoid cyclic referenece of the reader. However, it adds a unused field `readerImplWeakPtr_` only for tests. The access to this field is not thread safe that the write operation happens in `handleConsumerCreated` while the read operation can happen anywhere via the getter. So there is a little chance that `readerPtr` in `testReferenceLeak` doesn't point to the right object. In addition, we should only guarantee the reference count becomes 0 after the producer, consumer or reader goes out of its scope. #14797 adds a `ClientTest.testReferenceCount` but it's also flaky. It's caused by the shared pointer of `ProducerImpl` is published to another thread via `shared_from_this()` but the test has a strong expectation that the reference count is exactly 1. ### Modifications - Remove `readerImplWeakPtr_` from `ReaderImpl` and get the weak pointer from `Reader` directly by adding a method to `PulsarFriend`. - Add the check of reader's reference count to `testReferenceCount` and remove the redundant `testReferenceLeak`. - Instead of asserting the reference count of producer/consumer/reader is 1, just assume the it's greater than 0.

Fixes #14848 Fixes #14719 ### Motivation #7793 introduced a `testReferenceLeak` to avoid cyclic referenece of the reader. However, it adds a unused field `readerImplWeakPtr_` only for tests. The access to this field is not thread safe that the write operation happens in `handleConsumerCreated` while the read operation can happen anywhere via the getter. So there is a little chance that `readerPtr` in `testReferenceLeak` doesn't point to the right object. In addition, we should only guarantee the reference count becomes 0 after the producer, consumer or reader goes out of its scope. #14797 adds a `ClientTest.testReferenceCount` but it's also flaky. It's caused by the shared pointer of `ProducerImpl` is published to another thread via `shared_from_this()` but the test has a strong expectation that the reference count is exactly 1. ### Modifications - Remove `readerImplWeakPtr_` from `ReaderImpl` and get the weak pointer from `Reader` directly by adding a method to `PulsarFriend`. - Add the check of reader's reference count to `testReferenceCount` and remove the redundant `testReferenceLeak`. - Instead of asserting the reference count of producer/consumer/reader is 1, just assume the it's greater than 0. (cherry picked from commit f84ff57)

RobertIndie · 2022-04-11T05:37:16Z

After #14854 get merged, I still get this error.

example failure

[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from ClientTest
[ RUN      ] ClientTest.testReferenceCount
2022-04-11 04:43:29.498 INFO  [139942018497280] ClientConnection:189 | [<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
2022-04-11 04:43:29.498 INFO  [139942018497280] ConnectionPool:96 | Created connection for pulsar://localhost:6650
2022-04-11 04:43:29.500 INFO  [139941896357632] ClientConnection:375 | [127.0.0.1:45924 -> 127.0.0.1:6650] Connected to broker
2022-04-11 04:43:29.503 INFO  [139941896357632] HandlerBase:64 | [persistent://public/default/client-test-reference-count-[164](https://github.com/apache/pulsar/runs/5966290255?check_suite_focus=true#step:11:164)9652209, ] Getting connection from pool
2022-04-11 04:43:29.538 INFO  [139941896357632] ProducerImpl:189 | [persistent://public/default/client-test-reference-count-1649652209, ] Created producer on broker [127.0.0.1:45924 -> 127.0.0.1:6650]
2022-04-11 04:43:29.538 INFO  [1399420[184](https://github.com/apache/pulsar/runs/5966290255?check_suite_focus=true#step:11:184)97280] ClientTest:210 | Reference count of the producer: 1
2022-04-11 04:43:29.538 INFO  [139942018497280] Client:88 | Subscribing on Topic :client-test-reference-count-1649652209
2022-04-11 04:43:29.540 INFO  [13994[189](https://github.com/apache/pulsar/runs/5966290255?check_suite_focus=true#step:11:189)6357632] HandlerBase:64 | [persistent://public/default/client-test-reference-count-1649652209, my-sub, 0] Getting connection from pool
2022-04-11 04:43:29.558 INFO  [139941896357632] ConsumerImpl:224 | [persistent://public/default/client-test-reference-count-1649652209, my-sub, 0] Created consumer on broker [127.0.0.1:45924 -> 127.0.0.1:6650]
2022-04-11 04:43:29.560 INFO  [139942018497280] ClientTest:216 | Reference count of the consumer: 7
2022-04-11 04:43:29.562 INFO  [139941896357632] HandlerBase:64 | [persistent://public/default/client-test-reference-count-1649652209-reader, reader-0fd8e7ba92, 1] Getting connection from pool
2022-04-11 04:43:29.597 INFO  [139941896357632] ConsumerImpl:224 | [persistent://public/default/client-test-reference-count-1649652209-reader, reader-0fd8e7ba92, 1] Created consumer on broker [127.0.0.1:45924 -> 127.0.0.1:6650]
2022-04-11 04:43:29.598 INFO  [139942018497280] ClientTest:224 | Reference count of the reader's underlying consumer: 3
/pulsar/pulsar-client-cpp/tests/ClientTest.cc:227: Failure
Value of: 1
Expected: readerWeakPtr.use_count()
Which is: 5
2022-04-11 04:43:29.598 WARN  [139942018497280] ConsumerImpl:116 | [persistent://public/default/client-test-reference-count-1649652209, my-sub, 0] Destroyed consumer which was not properly closed
2022-04-11 04:43:29.598 INFO  [139942018497280] ConsumerImpl:124 | [persistent://public/default/client-test-reference-count-1649652209, my-sub, 0] Closed consumer for race condition: 0
2022-04-11 04:43:29.598 INFO  [139942018497280] ProducerImpl:628 | Producer - [persistent://public/default/client-test-reference-count-1649652209, standalone-0-203] , [batchMessageContainer = { BatchMessageContainer [size = 0] [bytes = 0] [maxSize = 1000] [maxBytes = 131072] [topicName = persistent://public/default/client-test-reference-count-1649652209] [numberOfBatchesSent_ = 0] [averageBatchSize_ = 0] }]
2022-04-11 04:43:29.598 WARN  [139942018497280] ProducerImpl:114 | [persistent://public/default/client-test-reference-count-1649652209, standalone-0-203] Destroyed producer which was not properly closed
2022-04-11 04:43:29.598 INFO  [139942018497280] ClientConnection:1560 | [127.0.0.1:45924 -> 127.0.0.1:6650] Connection closed
[  FAILED  ] ClientTest.testReferenceCount (101 ms)
[----------] 1 test from ClientTest (101 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (101 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] ClientTest.testReferenceCount

PTAL @BewareMyPower

BewareMyPower · 2022-04-11T05:40:35Z

@RobertIndie Good point. I'll fix it soon.

Fixes apache#14848 Fixes apache#14719 ### Motivation apache#7793 introduced a `testReferenceLeak` to avoid cyclic referenece of the reader. However, it adds a unused field `readerImplWeakPtr_` only for tests. The access to this field is not thread safe that the write operation happens in `handleConsumerCreated` while the read operation can happen anywhere via the getter. So there is a little chance that `readerPtr` in `testReferenceLeak` doesn't point to the right object. In addition, we should only guarantee the reference count becomes 0 after the producer, consumer or reader goes out of its scope. apache#14797 adds a `ClientTest.testReferenceCount` but it's also flaky. It's caused by the shared pointer of `ProducerImpl` is published to another thread via `shared_from_this()` but the test has a strong expectation that the reference count is exactly 1. ### Modifications - Remove `readerImplWeakPtr_` from `ReaderImpl` and get the weak pointer from `Reader` directly by adding a method to `PulsarFriend`. - Add the check of reader's reference count to `testReferenceCount` and remove the redundant `testReferenceLeak`. - Instead of asserting the reference count of producer/consumer/reader is 1, just assume the it's greater than 0.

BewareMyPower · 2022-04-21T07:44:04Z

Another failure

2022-04-21 07:18:25.262 INFO  [140554697825024] ClientTest:211 | Reference count of the reader's underlying consumer: 3
/pulsar/pulsar-client-cpp/tests/ClientTest.cc:214: Failure
Value of: 1
Expected: readerWeakPtr.use_count()
Which is: 5

Fixes apache#13849 Fixes apache#14848 ### Motivation apache#11570 adds a `testSendAsyncCloseAsyncConcurrentlyWithLazyProducers` for the case that some `sendAsync` calls that are invoked after `closeAsync` is called in another thread must complete with `ResultAlreadyClosed`. It's flaky because the synchronization between two threads is not strict. This test uses `sendStartLatch` for the order of `sendAsync` and `closeAsync`: ``` sendAsync 0,1,...,9 -> sendStartLatch is done -> closeAsync ``` However, it cannot guarantee the rest `sendAsync` calls happen after `closeAsync` is called. If so, all `sendAsync` calls will complete with `ResultOk`. On the other hand, this test is meaningless because it requires strict synchronization between two threads so there is no need to run `sendAsync` and `closeAsync` in two threads. The verification of this test is also wrong, see apache#13849 (comment). When `closeAsync` is called, the previous `sendAsync` calls might not complete, so all `sendAsync` will complete with `ResultAlreadyClosed`, not only those called after `closeAsync`. In addition, this PR also tries to fix the flaky `testReferenceCount`, which assumes too strictly. ### Modifications - Remove `testSendAsyncCloseAsyncConcurrentlyWithLazyProducers` - Only check the reference count is greater than 0 instead of equal to 1

Fixes #13849 Fixes #14848 ### Motivation #11570 adds a `testSendAsyncCloseAsyncConcurrentlyWithLazyProducers` for the case that some `sendAsync` calls that are invoked after `closeAsync` is called in another thread must complete with `ResultAlreadyClosed`. It's flaky because the synchronization between two threads is not strict. This test uses `sendStartLatch` for the order of `sendAsync` and `closeAsync`: ``` sendAsync 0,1,...,9 -> sendStartLatch is done -> closeAsync ``` However, it cannot guarantee the rest `sendAsync` calls happen after `closeAsync` is called. If so, all `sendAsync` calls will complete with `ResultOk`. On the other hand, this test is meaningless because it requires strict synchronization between two threads so there is no need to run `sendAsync` and `closeAsync` in two threads. The verification of this test is also wrong, see #13849 (comment). When `closeAsync` is called, the previous `sendAsync` calls might not complete, so all `sendAsync` will complete with `ResultAlreadyClosed`, not only those called after `closeAsync`. In addition, this PR also tries to fix the flaky `testReferenceCount`, which assumes too strictly. ### Modifications - Remove `testSendAsyncCloseAsyncConcurrentlyWithLazyProducers` - Only check the reference count is greater than 0 instead of equal to 1

Fixes #13849 Fixes #14848 ### Motivation #11570 adds a `testSendAsyncCloseAsyncConcurrentlyWithLazyProducers` for the case that some `sendAsync` calls that are invoked after `closeAsync` is called in another thread must complete with `ResultAlreadyClosed`. It's flaky because the synchronization between two threads is not strict. This test uses `sendStartLatch` for the order of `sendAsync` and `closeAsync`: ``` sendAsync 0,1,...,9 -> sendStartLatch is done -> closeAsync ``` However, it cannot guarantee the rest `sendAsync` calls happen after `closeAsync` is called. If so, all `sendAsync` calls will complete with `ResultOk`. On the other hand, this test is meaningless because it requires strict synchronization between two threads so there is no need to run `sendAsync` and `closeAsync` in two threads. The verification of this test is also wrong, see #13849 (comment). When `closeAsync` is called, the previous `sendAsync` calls might not complete, so all `sendAsync` will complete with `ResultAlreadyClosed`, not only those called after `closeAsync`. In addition, this PR also tries to fix the flaky `testReferenceCount`, which assumes too strictly. ### Modifications - Remove `testSendAsyncCloseAsyncConcurrentlyWithLazyProducers` - Only check the reference count is greater than 0 instead of equal to 1 (cherry picked from commit eeea9ca)

Fixes apache#13849 Fixes apache#14848 ### Motivation apache#11570 adds a `testSendAsyncCloseAsyncConcurrentlyWithLazyProducers` for the case that some `sendAsync` calls that are invoked after `closeAsync` is called in another thread must complete with `ResultAlreadyClosed`. It's flaky because the synchronization between two threads is not strict. This test uses `sendStartLatch` for the order of `sendAsync` and `closeAsync`: ``` sendAsync 0,1,...,9 -> sendStartLatch is done -> closeAsync ``` However, it cannot guarantee the rest `sendAsync` calls happen after `closeAsync` is called. If so, all `sendAsync` calls will complete with `ResultOk`. On the other hand, this test is meaningless because it requires strict synchronization between two threads so there is no need to run `sendAsync` and `closeAsync` in two threads. The verification of this test is also wrong, see apache#13849 (comment). When `closeAsync` is called, the previous `sendAsync` calls might not complete, so all `sendAsync` will complete with `ResultAlreadyClosed`, not only those called after `closeAsync`. In addition, this PR also tries to fix the flaky `testReferenceCount`, which assumes too strictly. ### Modifications - Remove `testSendAsyncCloseAsyncConcurrentlyWithLazyProducers` - Only check the reference count is greater than 0 instead of equal to 1 (cherry picked from commit eeea9ca) (cherry picked from commit 83b6833)

Fixes #13849 Fixes #14848 ### Motivation #11570 adds a `testSendAsyncCloseAsyncConcurrentlyWithLazyProducers` for the case that some `sendAsync` calls that are invoked after `closeAsync` is called in another thread must complete with `ResultAlreadyClosed`. It's flaky because the synchronization between two threads is not strict. This test uses `sendStartLatch` for the order of `sendAsync` and `closeAsync`: ``` sendAsync 0,1,...,9 -> sendStartLatch is done -> closeAsync ``` However, it cannot guarantee the rest `sendAsync` calls happen after `closeAsync` is called. If so, all `sendAsync` calls will complete with `ResultOk`. On the other hand, this test is meaningless because it requires strict synchronization between two threads so there is no need to run `sendAsync` and `closeAsync` in two threads. The verification of this test is also wrong, see #13849 (comment). When `closeAsync` is called, the previous `sendAsync` calls might not complete, so all `sendAsync` will complete with `ResultAlreadyClosed`, not only those called after `closeAsync`. In addition, this PR also tries to fix the flaky `testReferenceCount`, which assumes too strictly. ### Modifications - Remove `testSendAsyncCloseAsyncConcurrentlyWithLazyProducers` - Only check the reference count is greater than 0 instead of equal to 1 (cherry picked from commit eeea9ca)

codelipenghui · 2022-08-22T12:12:03Z

@BewareMyPower Looks like the test still can get failed https://github.com/apache/pulsar/runs/7948115233?check_suite_focus=true

BewareMyPower · 2022-08-23T07:59:36Z

This flaky test can be reproduced by

./tests/main --gtest_filter='ClientTest.testReferenceCount' --gtest_repeat=20

(the repeat count cannot be too large like 100, otherwise it might fail with ConnectError)

Repeating all tests (iteration 10) . . .

Note: Google Test filter = ClientTest.testReferenceCount
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from ClientTest
[ RUN      ] ClientTest.testReferenceCount
2022-08-23 15:58:05.988 INFO  [0x116028600] ClientConnection:189 | [<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
2022-08-23 15:58:05.988 INFO  [0x116028600] ConnectionPool:96 | Created connection for pulsar://localhost:6650
2022-08-23 15:58:05.989 INFO  [0x70000c17a000] ClientConnection:375 | [127.0.0.1:59906 -> 127.0.0.1:6650] Connected to broker
2022-08-23 15:58:05.990 INFO  [0x70000c17a000] HandlerBase:61 | [persistent://public/default/client-test-reference-count-1661241485, ] Getting connection from pool
2022-08-23 15:58:05.993 INFO  [0x70000c17a000] ProducerImpl:186 | [persistent://public/default/client-test-reference-count-1661241485, ] Created producer on broker [127.0.0.1:59906 -> 127.0.0.1:6650] 
2022-08-23 15:58:05.993 INFO  [0x116028600] ClientTest:210 | Reference count of the producer: 1
2022-08-23 15:58:05.993 INFO  [0x116028600] Client:88 | Subscribing on Topic :client-test-reference-count-1661241485
2022-08-23 15:58:05.995 INFO  [0x70000c17a000] HandlerBase:61 | [persistent://public/default/client-test-reference-count-1661241485, my-sub, 0] Getting connection from pool
2022-08-23 15:58:05.997 INFO  [0x70000c17a000] ConsumerImpl:220 | [persistent://public/default/client-test-reference-count-1661241485, my-sub, 0] Created consumer on broker [127.0.0.1:59906 -> 127.0.0.1:6650] 
2022-08-23 15:58:05.997 INFO  [0x116028600] ClientTest:216 | Reference count of the consumer: 1
2022-08-23 15:58:05.998 INFO  [0x70000c17a000] HandlerBase:61 | [persistent://public/default/client-test-reference-count-1661241485-reader, reader-3414259df0, 1] Getting connection from pool
2022-08-23 15:58:06.000 INFO  [0x70000c17a000] ConsumerImpl:220 | [persistent://public/default/client-test-reference-count-1661241485-reader, reader-3414259df0, 1] Created consumer on broker [127.0.0.1:59906 -> 127.0.0.1:6650] 
/Users/xuyunze/github.com/BewareMyPower/pulsar/pulsar-client-cpp/tests/ClientTest.cc:222: Failure
Expected equality of these values:
  consumers.size()
    Which is: 1
  2

Demogorgon314 · 2022-08-26T01:30:28Z

Another failure

2022-08-26 01:19:26.474 INFO  [140275677335232] ClientConnection:189 | [<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
2022-08-26 01:19:26.474 INFO  [140275677335232] ConnectionPool:96 | Created connection for pulsar://localhost:6650
2022-08-26 01:19:26.477 INFO  [140275674232576] ClientConnection:375 | [127.0.0.1:48584 -> 127.0.0.1:6650] Connected to broker
2022-08-26 01:19:26.478 INFO  [140275674232576] HandlerBase:61 | [persistent://public/default/client-test-reference-count-1661476766, ] Getting connection from pool
2022-08-26 01:19:26.486 INFO  [140275674232576] ProducerImpl:186 | [persistent://public/default/client-test-reference-count-1661476766, ] Created producer on broker [127.0.0.1:48584 -> 127.0.0.1:6650] 
2022-08-26 01:19:26.486 INFO  [140275677335232] ClientTest:210 | Reference count of the producer: 7
2022-08-26 01:19:26.486 INFO  [140275677335232] Client:88 | Subscribing on Topic :client-test-reference-count-1661476766
2022-08-26 01:19:26.486 INFO  [140275674232576] HandlerBase:61 | [persistent://public/default/client-test-reference-count-1661476766, my-sub, 0] Getting connection from pool
2022-08-26 01:19:26.490 INFO  [140275674232576] ConsumerImpl:224 | [persistent://public/default/client-test-reference-count-1661476766, my-sub, 0] Created consumer on broker [127.0.0.1:48584 -> 127.0.0.1:6650] 
2022-08-26 01:19:26.490 INFO  [140275677335232] ClientTest:216 | Reference count of the consumer: 7
2022-08-26 01:19:26.491 INFO  [140275674232576] HandlerBase:61 | [persistent://public/default/client-test-reference-count-1661476766-reader, reader-0dc8501272, 1] Getting connection from pool
2022-08-26 01:19:26.496 INFO  [140275674232576] ConsumerImpl:224 | [persistent://public/default/client-test-reference-count-1661476766-reader, reader-0dc8501272, 1] Created consumer on broker [127.0.0.1:48584 -> 127.0.0.1:6650] 
/pulsar/pulsar-client-cpp/tests/ClientTest.cc:222: Failure
Expected equality of these values:
  consumers.size()
    Which is: 1
  2
2022-08-26 01:19:26.496 WARN  [140275677335232] ConsumerImpl:118 | [persistent://public/default/client-test-reference-count-1661476766, my-sub, 0] Destroyed consumer which was not properly closed
2022-08-26 01:19:26.496 INFO  [140275677335232] ConsumerImpl:126 | [persistent://public/default/client-test-reference-count-1661476766, my-sub, 0] Closed consumer for race condition: 0
2022-08-26 01:19:26.496 INFO  [140275677335232] ProducerImpl:619 | Producer - [persistent://public/default/client-test-reference-count-1661476766, standalone-0-213] , [batchMessageContainer = { BatchMessageContainer [size = 0] [bytes = 0] [maxSize = 1000] [maxBytes = 131072] [topicName = persistent://public/default/client-test-reference-count-1661476766] [numberOfBatchesSent_ = 0] [averageBatchSize_ = 0] }]
2022-08-26 01:19:26.496 WARN  [140275677335232] ProducerImpl:114 | [persistent://public/default/client-test-reference-count-1661476766, standalone-0-213] Destroyed producer which was not properly closed

coderzc · 2022-08-29T03:34:03Z

+1 https://github.com/apache/pulsar/runs/8062672745?check_suite_focus=true

…CreatedCallback_`. (#17325) Fixes #14848 ### Motivation We should execute `callback` before executing `readerCreatedCallback_`, otherwise, we may get the wrong consumers size. More see: https://github.com/apache/pulsar/blob/e23d312c04da1d82d35f9e2faf8a446f8e8a4eeb/pulsar-client-cpp/lib/ReaderImpl.cc#L84-L92 https://github.com/apache/pulsar/blob/c48a3243287c7d775459b6437d9f4b24ed44cf4c/pulsar-client-cpp/lib/ClientImpl.cc#L250-L254 ### Modifications execute `callback` before executing `readerCreatedCallback_`

RobertIndie · 2022-08-30T08:40:17Z

This issue still exists when I use the latest master branch code:
https://github.com/apache/pulsar/runs/8086575635?check_suite_focus=true#step:11:256

[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from ClientTest
[ RUN      ] ClientTest.testReferenceCount
2022-08-30 08:17:10.954 INFO  [140089060101824] ClientConnection:189 | [<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
2022-08-30 08:17:10.954 INFO  [140089060101824] ConnectionPool:96 | Created connection for pulsar://localhost:6650
2022-08-30 08:17:10.956 INFO  [140089056999168] ClientConnection:375 | [127.0.0.1:41008 -> 127.0.0.1:6650] Connected to broker
2022-08-30 08:17:10.957 INFO  [140089056999168] HandlerBase:61 | [persistent://public/default/client-test-reference-count-1661847430, ] Getting connection from pool
2022-08-30 08:17:10.960 INFO  [140089056999168] ProducerImpl:186 | [persistent://public/default/client-test-reference-count-1661847430, ] Created producer on broker [127.0.0.1:41008 -> 127.0.0.1:6650]
2022-08-30 08:17:10.960 INFO  [140089060101824] ClientTest:210 | Reference count of the producer: 7
2022-08-30 08:17:10.960 INFO  [140089060101824] Client:88 | Subscribing on Topic :client-test-reference-count-1661847430
2022-08-30 08:17:10.961 INFO  [140089056999168] HandlerBase:61 | [persistent://public/default/client-test-reference-count-1661847430, my-sub, 0] Getting connection from pool
2022-08-30 08:17:10.962 INFO  [140089056999168] ConsumerImpl:220 | [persistent://public/default/client-test-reference-count-1661847430, my-sub, 0] Created consumer on broker [127.0.0.1:41008 -> 127.0.0.1:6650]
2022-08-30 08:17:10.962 INFO  [140089060101824] ClientTest:216 | Reference count of the consumer: 7
2022-08-30 08:17:10.962 INFO  [140089056999168] HandlerBase:61 | [persistent://public/default/client-test-reference-count-1661847430-reader, reader-a635288687, 1] Getting connection from pool
2022-08-30 08:17:10.964 INFO  [140089056999168] ConsumerImpl:220 | [persistent://public/default/client-test-reference-count-1661847430-reader, reader-a635288687, 1] Created consumer on broker [127.0.0.1:41008 -> 127.0.0.1:6650]
2022-08-30 08:17:10.964 INFO  [140089060101824] ClientTest:224 | Reference count of the reader's underlying consumer: 3
2022-08-30 08:17:10.964 INFO  [140089060101824] ClientTest:228 | Reference count of the reader: 4
2022-08-30 08:17:10.964 WARN  [140089060101824] ConsumerImpl:116 | [persistent://public/default/client-test-reference-count-1661847430, my-sub, 0] Destroyed consumer which was not properly closed
2022-08-30 08:17:10.964 INFO  [140089060101824] ConsumerImpl:124 | [persistent://public/default/client-test-reference-count-1661847430, my-sub, 0] Closed consumer for race condition: 0
2022-08-30 08:17:10.964 INFO  [140089060101824] ProducerImpl:619 | Producer - [persistent://public/default/client-test-reference-count-1661847430, standalone-0-213] , [batchMessageContainer = { BatchMessageContainer [size = 0] [bytes = 0] [maxSize = 1000] [maxBytes = 131072] [topicName = persistent://public/default/client-test-reference-count-1661847430] [numberOfBatchesSent_ = 0] [averageBatchSize_ = 0] }]
2022-08-30 08:17:10.964 WARN  [140089060101824] ProducerImpl:114 | [persistent://public/default/client-test-reference-count-1661847430, standalone-0-213] Destroyed producer which was not properly closed
/pulsar/pulsar-client-cpp/tests/ClientTest.cc:[235](https://github.com/apache/pulsar/runs/8086575635?check_suite_focus=true#step:11:236): Failure
Expected equality of these values:
  consumers[1].use_count()
    Which is: 3
  0
[  FAILED  ] ClientTest.testReferenceCount (10 ms)
[----------] 1 test from ClientTest (10 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (10 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] ClientTest.testReferenceCount

BewareMyPower · 2022-09-07T05:19:21Z

Another failure

2022-09-05 15:55:00.990 INFO  [139735811178176] ClientConnection:189 | [<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
2022-09-05 15:55:00.990 INFO  [139735811178176] ConnectionPool:96 | Created connection for pulsar://localhost:6650
2022-09-05 15:55:00.991 INFO  [139735808075520] ClientConnection:375 | [127.0.0.1:54416 -> 127.0.0.1:6650] Connected to broker
2022-09-05 15:55:00.992 INFO  [139735808075520] HandlerBase:61 | [persistent://public/default/client-test-reference-count-1662393300, ] Getting connection from pool
2022-09-05 15:55:01.000 INFO  [139735808075520] ProducerImpl:186 | [persistent://public/default/client-test-reference-count-1662393300, ] Created producer on broker [127.0.0.1:54416 -> 127.0.0.1:6650]
2022-09-05 15:55:01.000 INFO  [139735811178176] ClientTest:198 | Reference count of the producer: 7
2022-09-05 15:55:01.000 INFO  [139735811178176] Client:88 | Subscribing on Topic :client-test-reference-count-1662393300
2022-09-05 15:55:01.001 INFO  [139735808075520] HandlerBase:61 | [persistent://public/default/client-test-reference-count-1662393300, my-sub, 0] Getting connection from pool
2022-09-05 15:55:01.002 INFO  [139735808075520] ConsumerImpl:220 | [persistent://public/default/client-test-reference-count-166[239](https://github.com/apache/pulsar/runs/8190894294?check_suite_focus=true#step:11:240)3300, my-sub, 0] Created consumer on broker [127.0.0.1:54416 -> 127.0.0.1:6650]
2022-09-05 15:55:01.002 INFO  [139735811178176] ClientTest:204 | Reference count of the consumer: 7
2022-09-05 15:55:01.003 INFO  [139735808075520] HandlerBase:61 | [persistent://public/default/client-test-reference-count-1662393300-reader, reader-462cf2b281, 1] Getting connection from pool
2022-09-05 15:55:01.007 INFO  [139735808075520] ConsumerImpl:220 | [persistent://public/default/client-test-reference-count-1662393300-reader, reader-462cf2b281, 1] Created consumer on broker [127.0.0.1:54416 -> 127.0.0.1:6650]
2022-09-05 15:55:01.007 INFO  [139735811178176] ClientTest:212 | Reference count of the reader's underlying consumer: 3
2022-09-05 15:55:01.007 INFO  [139735811178176] ClientTest:216 | Reference count of the reader: 4
2022-09-05 15:55:01.007 WARN  [139735811178176] ConsumerImpl:116 | [persistent://public/default/client-test-reference-count-1662393300, my-sub, 0] Destroyed consumer which was not properly closed
2022-09-05 15:55:01.007 INFO  [139735811178176] ConsumerImpl:124 | [persistent://public/default/client-test-reference-count-1662393300, my-sub, 0] Closed consumer for race condition: 0
2022-09-05 15:55:01.007 INFO  [139735811178176] ProducerImpl:619 | Producer - [persistent://public/default/client-test-reference-count-1662393300, standalone-0-204] , [batchMessageContainer = { BatchMessageContainer [size = 0] [bytes = 0] [maxSize = 1000] [maxBytes = 131072] [topicName = persistent://public/default/client-test-reference-count-1662393300] [numberOfBatchesSent_ = 0] [averageBatchSize_ = 0] }]
2022-09-05 15:55:01.007 WARN  [139735811178176] ProducerImpl:114 | [persistent://public/default/client-test-reference-count-1662393300, standalone-0-204] Destroyed producer which was not properly closed
/pulsar/pulsar-client-cpp/tests/ClientTest.cc:223: Failure
Expected equality of these values:
  consumers[1].use_count()
    Which is: 3
  0

BewareMyPower · 2022-09-08T09:57:32Z

Now it cannot be reproduced in my local env (macOS 12.3.1, Apple clang version 13.1.6) but it can be reproduced in ubuntu 20.04 container (Ubuntu 9.4.0-1ubuntu1~20.04.1). All the error logs are the same that the reference count of the producer, consumer, reader are 7, 7, 4.

I will investigate more for this flaky test.

…CreatedCallback_`. (apache#17325) Fixes apache#14848 ### Motivation We should execute `callback` before executing `readerCreatedCallback_`, otherwise, we may get the wrong consumers size. More see: https://github.com/apache/pulsar/blob/e23d312c04da1d82d35f9e2faf8a446f8e8a4eeb/pulsar-client-cpp/lib/ReaderImpl.cc#L84-L92 https://github.com/apache/pulsar/blob/c48a3243287c7d775459b6437d9f4b24ed44cf4c/pulsar-client-cpp/lib/ClientImpl.cc#L250-L254 ### Modifications execute `callback` before executing `readerCreatedCallback_` (cherry picked from commit 3bc50a4)

BewareMyPower · 2022-09-14T07:45:40Z

Steps to reproduce

First, start a container:

mvn clean install -DskipTests -Pcore-modules,-main
docker run -v $PWD:/pulsar -it apachepulsar/pulsar-build:ubuntu-20.04 /bin/bash

Then, run the following commands inside the container:

# Build the unit test (increase the docker memory if segfault happened, or remove -j4 option)
cd /pulsar/pulsar-client-cpp/
cmake -B _builds -DBUILD_DYNAMIC_LIB=OFF -DBUILD_PERF_TOOLS=OFF -DBUILD_PYTHON_WRAPPER=OFF
cmake --build _builds -j4

# Run the standalone
cd /pulsar/distribution/server/target/
tar zxf apache-pulsar-*.tar.gz
cd apache-pulsar-*-SNAPSHOT
./bin/pulsar-daemon start standalone -nss -nfw

# Run the test for multiple times
cd /pulsar/pulsar-client-cpp/_builds
./tests/main --gtest_filter='ClientTest.testReferenceCount' --gtest_repeat=10

Fixes apache#14848 ### Motivation There were several fixes on `ClientTest.testReferenceCount` but it's still very flaky. The root cause is even after a `Reader` went out of the scope and destructed, there was still a `Reader` object existed in the thread of the event loop. See https://github.com/apache/pulsar/blob/845daf5cac23a4dda4a209d91c85804a0bcaf28a/pulsar-client-cpp/lib/ReaderImpl.cc#L88 To verify this point, I added some logs and saw: ``` 2022-09-14 03:52:28.427 INFO [140046042864960] Reader:39 | Reader ctor 0x7fffd2a7c110 # ... 2022-09-14 03:52:28.444 INFO [140046039774976] Reader:42 | Reader ctor 0x7f5f0273d720 ReaderImpl(0x7f5efc00a9d0, 3) # ... 2022-09-14 03:52:28.445 INFO [140046042864960] ClientTest:217 | Reference count of the reader: 4 # ... 2022-09-14 03:52:28.445 INFO [140046042864960] ClientImpl:490 | Closing Pulsar client with 1 producers and 2 consumers 2022-09-14 03:52:28.445 INFO [140046039774976] Reader:55 | Reader dtor 0x7f5f0273d720 ReaderImpl(0x7f5efc00a9d0, 3) ``` The first `Reader` object 0x7fffd2a7c110 was constructed in main thread 140046042864960. However, it destructed before another `Reader` object 0x0x7f5f0273d720 that was constructed in event loop thread 140046039774976. When the callback passed to `createReaderAsync` completed the promise, the `createReader` immediately returns, at the same time the `Reader` object in the callback was still in the scope and not destructed. Since `Reader` holds a `shared_ptr<ReaderImpl>` and `ReaderImpl` holds a `shared_ptr<ConsumerImpl>`, if we check the reference count too quickly, the reference count of the underlying consumer is still positive because the `Reader` was not destructed at the moment. ### Modifications Since we cannot determine the precise destructed time point because that `Reader` object is in the event loop thread, we have to wait for a while. This PR adds a `waitUntil` utility function to wait for at most some time until the condition is met. Then wait until the reference count becomes 0 after the `Reader` object goes out of scope. Replace `ASSERT_EQ` with `EXPECT_EQ` to let the test continue if it failed. ### Verifying this change Following the steps here to reproduce: apache#14848 (comment) The test never failed even with `--gtest_repeat=100`.

Fixes #14848 ### Motivation There were several fixes on `ClientTest.testReferenceCount` but it's still very flaky. The root cause is even after a `Reader` went out of the scope and destructed, there was still a `Reader` object existed in the thread of the event loop. See https://github.com/apache/pulsar/blob/845daf5cac23a4dda4a209d91c85804a0bcaf28a/pulsar-client-cpp/lib/ReaderImpl.cc#L88 To verify this point, I added some logs and saw: ``` 2022-09-14 03:52:28.427 INFO [140046042864960] Reader:39 | Reader ctor 0x7fffd2a7c110 # ... 2022-09-14 03:52:28.444 INFO [140046039774976] Reader:42 | Reader ctor 0x7f5f0273d720 ReaderImpl(0x7f5efc00a9d0, 3) # ... 2022-09-14 03:52:28.445 INFO [140046042864960] ClientTest:217 | Reference count of the reader: 4 # ... 2022-09-14 03:52:28.445 INFO [140046042864960] ClientImpl:490 | Closing Pulsar client with 1 producers and 2 consumers 2022-09-14 03:52:28.445 INFO [140046039774976] Reader:55 | Reader dtor 0x7f5f0273d720 ReaderImpl(0x7f5efc00a9d0, 3) ``` The first `Reader` object 0x7fffd2a7c110 was constructed in main thread 140046042864960. However, it destructed before another `Reader` object 0x0x7f5f0273d720 that was constructed in event loop thread 140046039774976. When the callback passed to `createReaderAsync` completed the promise, the `createReader` immediately returns, at the same time the `Reader` object in the callback was still in the scope and not destructed. Since `Reader` holds a `shared_ptr<ReaderImpl>` and `ReaderImpl` holds a `shared_ptr<ConsumerImpl>`, if we check the reference count too quickly, the reference count of the underlying consumer is still positive because the `Reader` was not destructed at the moment. ### Modifications Since we cannot determine the precise destructed time point because that `Reader` object is in the event loop thread, we have to wait for a while. This PR adds a `waitUntil` utility function to wait for at most some time until the condition is met. Then wait until the reference count becomes 0 after the `Reader` object goes out of scope. Replace `ASSERT_EQ` with `EXPECT_EQ` to let the test continue if it failed. ### Verifying this change Following the steps here to reproduce: #14848 (comment) The test never failed even with `--gtest_repeat=100`.

…CreatedCallback_`. (#17325) (#17629) Fixes #14848 ### Motivation We should execute `callback` before executing `readerCreatedCallback_`, otherwise, we may get the wrong consumers size. More see: https://github.com/apache/pulsar/blob/e23d312c04da1d82d35f9e2faf8a446f8e8a4eeb/pulsar-client-cpp/lib/ReaderImpl.cc#L84-L92 https://github.com/apache/pulsar/blob/c48a3243287c7d775459b6437d9f4b24ed44cf4c/pulsar-client-cpp/lib/ClientImpl.cc#L250-L254 ### Modifications execute `callback` before executing `readerCreatedCallback_` (cherry picked from commit 3bc50a4)

Fixes #14848 ### Motivation There were several fixes on `ClientTest.testReferenceCount` but it's still very flaky. The root cause is even after a `Reader` went out of the scope and destructed, there was still a `Reader` object existed in the thread of the event loop. See https://github.com/apache/pulsar/blob/845daf5cac23a4dda4a209d91c85804a0bcaf28a/pulsar-client-cpp/lib/ReaderImpl.cc#L88 To verify this point, I added some logs and saw: ``` 2022-09-14 03:52:28.427 INFO [140046042864960] Reader:39 | Reader ctor 0x7fffd2a7c110 # ... 2022-09-14 03:52:28.444 INFO [140046039774976] Reader:42 | Reader ctor 0x7f5f0273d720 ReaderImpl(0x7f5efc00a9d0, 3) # ... 2022-09-14 03:52:28.445 INFO [140046042864960] ClientTest:217 | Reference count of the reader: 4 # ... 2022-09-14 03:52:28.445 INFO [140046042864960] ClientImpl:490 | Closing Pulsar client with 1 producers and 2 consumers 2022-09-14 03:52:28.445 INFO [140046039774976] Reader:55 | Reader dtor 0x7f5f0273d720 ReaderImpl(0x7f5efc00a9d0, 3) ``` The first `Reader` object 0x7fffd2a7c110 was constructed in main thread 140046042864960. However, it destructed before another `Reader` object 0x0x7f5f0273d720 that was constructed in event loop thread 140046039774976. When the callback passed to `createReaderAsync` completed the promise, the `createReader` immediately returns, at the same time the `Reader` object in the callback was still in the scope and not destructed. Since `Reader` holds a `shared_ptr<ReaderImpl>` and `ReaderImpl` holds a `shared_ptr<ConsumerImpl>`, if we check the reference count too quickly, the reference count of the underlying consumer is still positive because the `Reader` was not destructed at the moment. ### Modifications Since we cannot determine the precise destructed time point because that `Reader` object is in the event loop thread, we have to wait for a while. This PR adds a `waitUntil` utility function to wait for at most some time until the condition is met. Then wait until the reference count becomes 0 after the `Reader` object goes out of scope. Replace `ASSERT_EQ` with `EXPECT_EQ` to let the test continue if it failed. ### Verifying this change Following the steps here to reproduce: #14848 (comment) The test never failed even with `--gtest_repeat=100`. (cherry picked from commit 4ef8dc5)

Fixes #14848 ### Motivation There were several fixes on `ClientTest.testReferenceCount` but it's still very flaky. The root cause is even after a `Reader` went out of the scope and destructed, there was still a `Reader` object existed in the thread of the event loop. See https://github.com/apache/pulsar/blob/845daf5cac23a4dda4a209d91c85804a0bcaf28a/pulsar-client-cpp/lib/ReaderImpl.cc#L88 To verify this point, I added some logs and saw: ``` 2022-09-14 03:52:28.427 INFO [140046042864960] Reader:39 | Reader ctor 0x7fffd2a7c110 # ... 2022-09-14 03:52:28.444 INFO [140046039774976] Reader:42 | Reader ctor 0x7f5f0273d720 ReaderImpl(0x7f5efc00a9d0, 3) # ... 2022-09-14 03:52:28.445 INFO [140046042864960] ClientTest:217 | Reference count of the reader: 4 # ... 2022-09-14 03:52:28.445 INFO [140046042864960] ClientImpl:490 | Closing Pulsar client with 1 producers and 2 consumers 2022-09-14 03:52:28.445 INFO [140046039774976] Reader:55 | Reader dtor 0x7f5f0273d720 ReaderImpl(0x7f5efc00a9d0, 3) ``` The first `Reader` object 0x7fffd2a7c110 was constructed in main thread 140046042864960. However, it destructed before another `Reader` object 0x0x7f5f0273d720 that was constructed in event loop thread 140046039774976. When the callback passed to `createReaderAsync` completed the promise, the `createReader` immediately returns, at the same time the `Reader` object in the callback was still in the scope and not destructed. Since `Reader` holds a `shared_ptr<ReaderImpl>` and `ReaderImpl` holds a `shared_ptr<ConsumerImpl>`, if we check the reference count too quickly, the reference count of the underlying consumer is still positive because the `Reader` was not destructed at the moment. ### Modifications Since we cannot determine the precise destructed time point because that `Reader` object is in the event loop thread, we have to wait for a while. This PR adds a `waitUntil` utility function to wait for at most some time until the condition is met. Then wait until the reference count becomes 0 after the `Reader` object goes out of scope. Replace `ASSERT_EQ` with `EXPECT_EQ` to let the test continue if it failed. ### Verifying this change Following the steps here to reproduce: #14848 (comment) The test never failed even with `--gtest_repeat=100`.

Fixes #14848 ### Motivation There were several fixes on `ClientTest.testReferenceCount` but it's still very flaky. The root cause is even after a `Reader` went out of the scope and destructed, there was still a `Reader` object existed in the thread of the event loop. See https://github.com/apache/pulsar/blob/845daf5cac23a4dda4a209d91c85804a0bcaf28a/pulsar-client-cpp/lib/ReaderImpl.cc#L88 To verify this point, I added some logs and saw: ``` 2022-09-14 03:52:28.427 INFO [140046042864960] Reader:39 | Reader ctor 0x7fffd2a7c110 # ... 2022-09-14 03:52:28.444 INFO [140046039774976] Reader:42 | Reader ctor 0x7f5f0273d720 ReaderImpl(0x7f5efc00a9d0, 3) # ... 2022-09-14 03:52:28.445 INFO [140046042864960] ClientTest:217 | Reference count of the reader: 4 # ... 2022-09-14 03:52:28.445 INFO [140046042864960] ClientImpl:490 | Closing Pulsar client with 1 producers and 2 consumers 2022-09-14 03:52:28.445 INFO [140046039774976] Reader:55 | Reader dtor 0x7f5f0273d720 ReaderImpl(0x7f5efc00a9d0, 3) ``` The first `Reader` object 0x7fffd2a7c110 was constructed in main thread 140046042864960. However, it destructed before another `Reader` object 0x0x7f5f0273d720 that was constructed in event loop thread 140046039774976. When the callback passed to `createReaderAsync` completed the promise, the `createReader` immediately returns, at the same time the `Reader` object in the callback was still in the scope and not destructed. Since `Reader` holds a `shared_ptr<ReaderImpl>` and `ReaderImpl` holds a `shared_ptr<ConsumerImpl>`, if we check the reference count too quickly, the reference count of the underlying consumer is still positive because the `Reader` was not destructed at the moment. ### Modifications Since we cannot determine the precise destructed time point because that `Reader` object is in the event loop thread, we have to wait for a while. This PR adds a `waitUntil` utility function to wait for at most some time until the condition is met. Then wait until the reference count becomes 0 after the `Reader` object goes out of scope. Replace `ASSERT_EQ` with `EXPECT_EQ` to let the test continue if it failed. ### Verifying this change Following the steps here to reproduce: #14848 (comment) The test never failed even with `--gtest_repeat=100`. (cherry picked from commit 4ef8dc5)

Fixes apache#14848 ### Motivation There were several fixes on `ClientTest.testReferenceCount` but it's still very flaky. The root cause is even after a `Reader` went out of the scope and destructed, there was still a `Reader` object existed in the thread of the event loop. See https://github.com/apache/pulsar/blob/845daf5cac23a4dda4a209d91c85804a0bcaf28a/pulsar-client-cpp/lib/ReaderImpl.cc#L88 To verify this point, I added some logs and saw: ``` 2022-09-14 03:52:28.427 INFO [140046042864960] Reader:39 | Reader ctor 0x7fffd2a7c110 # ... 2022-09-14 03:52:28.444 INFO [140046039774976] Reader:42 | Reader ctor 0x7f5f0273d720 ReaderImpl(0x7f5efc00a9d0, 3) # ... 2022-09-14 03:52:28.445 INFO [140046042864960] ClientTest:217 | Reference count of the reader: 4 # ... 2022-09-14 03:52:28.445 INFO [140046042864960] ClientImpl:490 | Closing Pulsar client with 1 producers and 2 consumers 2022-09-14 03:52:28.445 INFO [140046039774976] Reader:55 | Reader dtor 0x7f5f0273d720 ReaderImpl(0x7f5efc00a9d0, 3) ``` The first `Reader` object 0x7fffd2a7c110 was constructed in main thread 140046042864960. However, it destructed before another `Reader` object 0x0x7f5f0273d720 that was constructed in event loop thread 140046039774976. When the callback passed to `createReaderAsync` completed the promise, the `createReader` immediately returns, at the same time the `Reader` object in the callback was still in the scope and not destructed. Since `Reader` holds a `shared_ptr<ReaderImpl>` and `ReaderImpl` holds a `shared_ptr<ConsumerImpl>`, if we check the reference count too quickly, the reference count of the underlying consumer is still positive because the `Reader` was not destructed at the moment. ### Modifications Since we cannot determine the precise destructed time point because that `Reader` object is in the event loop thread, we have to wait for a while. This PR adds a `waitUntil` utility function to wait for at most some time until the condition is met. Then wait until the reference count becomes 0 after the `Reader` object goes out of scope. Replace `ASSERT_EQ` with `EXPECT_EQ` to let the test continue if it failed. ### Verifying this change Following the steps here to reproduce: apache#14848 (comment) The test never failed even with `--gtest_repeat=100`. (cherry picked from commit 4ef8dc5) (cherry picked from commit 380031d)

…CreatedCallback_`. (apache#17325) Fixes apache#14848 ### Motivation We should execute `callback` before executing `readerCreatedCallback_`, otherwise, we may get the wrong consumers size. More see: https://github.com/apache/pulsar/blob/e23d312c04da1d82d35f9e2faf8a446f8e8a4eeb/pulsar-client-cpp/lib/ReaderImpl.cc#L84-L92 https://github.com/apache/pulsar/blob/c48a3243287c7d775459b6437d9f4b24ed44cf4c/pulsar-client-cpp/lib/ClientImpl.cc#L250-L254 ### Modifications execute `callback` before executing `readerCreatedCallback_` (cherry picked from commit 3bc50a4) (cherry picked from commit 2672446)

BewareMyPower added area/test component/client-c++ type/flaky-tests labels Mar 24, 2022

BewareMyPower self-assigned this Mar 24, 2022

sijie mentioned this issue Mar 24, 2022

ISSUE-14848: Flaky-test: C++ ClientTest.testReferenceCount streamnative/pulsar-archived#3960

Open

BewareMyPower mentioned this issue Mar 24, 2022

[C++] Fix flaky tests about reference count #14854

Merged

lhotari closed this as completed in #14854 Apr 1, 2022

RobertIndie reopened this Apr 11, 2022

BewareMyPower mentioned this issue Apr 22, 2022

[C++] Remove the flaky and meaningless tests #15271

Merged

4 tasks

BewareMyPower closed this as completed in #15271 Apr 22, 2022

codelipenghui reopened this Aug 22, 2022

coderzc mentioned this issue Aug 29, 2022

[fix][client-c++] Fix wrong consumers size #17325

Merged

4 tasks

BewareMyPower closed this as completed in #17325 Aug 29, 2022

RobertIndie reopened this Aug 30, 2022

codelipenghui unassigned BewareMyPower Sep 2, 2022

BewareMyPower self-assigned this Sep 8, 2022

BewareMyPower mentioned this issue Sep 14, 2022

[fix][cpp] Fix flaky testReferenceCount #17645

Merged

4 tasks

merlimat closed this as completed in #17645 Sep 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky-test: C++ ClientTest.testReferenceCount #14848

Flaky-test: C++ ClientTest.testReferenceCount #14848

BewareMyPower commented Mar 24, 2022 •

edited

Loading

RobertIndie commented Apr 11, 2022

BewareMyPower commented Apr 11, 2022

BewareMyPower commented Apr 21, 2022

codelipenghui commented Aug 22, 2022

BewareMyPower commented Aug 23, 2022 •

edited

Loading

Demogorgon314 commented Aug 26, 2022

coderzc commented Aug 29, 2022

RobertIndie commented Aug 30, 2022

BewareMyPower commented Sep 7, 2022

BewareMyPower commented Sep 8, 2022

BewareMyPower commented Sep 14, 2022

Flaky-test: C++ ClientTest.testReferenceCount #14848

Flaky-test: C++ ClientTest.testReferenceCount #14848

Comments

BewareMyPower commented Mar 24, 2022 • edited Loading

RobertIndie commented Apr 11, 2022

BewareMyPower commented Apr 11, 2022

BewareMyPower commented Apr 21, 2022

codelipenghui commented Aug 22, 2022

BewareMyPower commented Aug 23, 2022 • edited Loading

Demogorgon314 commented Aug 26, 2022

coderzc commented Aug 29, 2022

RobertIndie commented Aug 30, 2022

BewareMyPower commented Sep 7, 2022

BewareMyPower commented Sep 8, 2022

BewareMyPower commented Sep 14, 2022

BewareMyPower commented Mar 24, 2022 •

edited

Loading

BewareMyPower commented Aug 23, 2022 •

edited

Loading