-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] Fix seeking by timestamp can be reset the cursor position to earliest #23919
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for investigating the issue and providing a fix, @dao-jun . I wonder if there would be a way to add a failing test case which would then prevent future regressions?
/pulsarbot rerun-failure-checks |
Checkstyle errors:
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #23919 +/- ##
============================================
+ Coverage 73.57% 74.27% +0.69%
+ Complexity 32624 31906 -718
============================================
Files 1877 1853 -24
Lines 139502 143815 +4313
Branches 15299 16339 +1040
============================================
+ Hits 102638 106812 +4174
+ Misses 28908 28614 -294
- Partials 7956 8389 +433
Flags with carried forward coverage won't be shown. Click here to find out more.
|
@dao-jun Do you have thoughts about adding a failing test case that would prevent future regressions? It seems that we don't currently have proper tests that would test the seeking behavior at the Pulsar client level. |
I modified some tests in PersistentMessageFinderTest, it should cover this case. For Pulsar client level seeking tests, maybe I need to add some |
@dao-jun Yes, it covers these changes, but an end-to-end seek test that would reproduce the regression is something I was after. Such tests would be very useful in ensuring that the seek feature really works. I didn't find many tests at the Pulsar client level. There are a few, but they are extremely simple. |
@lhotari I've added a client level seek test, PTAL |
It looks there is another case which may cause the issue: the timestamp's target entry belongs to a trimmed ledger. |
/pulsarbot rerun-failure-checks |
Good work @dao-jun , thanks for handling this. |
Thanks for review, you are welcome |
Fixes #23910
Motivation
As the we don't enable
AppendBrokerTimestampMetadataInterceptor
by default, so the entries timestamp is not Strictly Increasing.Because the message timestamp is generated by the clients, the messages from different producers maybe not in global ordering(because of network delay, backpressure, thread scheduling, etc)
In a single ledger, they may be arranged in the following way:
[2, 1, 3, 5, 4, 6, 7, 9, 8.....]
Overall, they have a self increasing trend, but locally, it may be possible not.
Position
is null when callPersistentMessageFinder.findMessages
, it will reset the cursor position toearliest
, see: https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentSubscription.java#L804-L824In 823a55d#diff-1d1f02c0ae1aed67e77512aebe4d7233705490b14960ee428611462e446861e5R133-R142, we optimized the case of the target entry maybe in the last opening ledger. It's very intuitive but the actual situation is more complicated:
If we want to find the message's position whose timestamp is 101, the second-to-last ledger's close timestamp is 100, and the entries's timestamp in the last opening ledger arranged as [102,103,101,104...]
The first entry's timestamp is greater than 101, so it will not meet the condition of https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentMessageFinder.java#L74-L75, and the return value will be null, the cursor position will be finally set to earliest.
Modifications
Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository:
The tests will be run in the forked repository until all PR review comments have
been handled, the tests pass and the PR is approved by a reviewer.
-->