Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solve the slow consumer, subscription problem #1899

Closed
wants to merge 1 commit into from
Closed

Solve the slow consumer, subscription problem #1899

wants to merge 1 commit into from

Conversation

alok87
Copy link

@alok87 alok87 commented Mar 12, 2021

Fixes #1897

  • Use ticks to flush new subscriptions.
  • Make the tick interval configurable for different use case.

Why is it needed?
Details in #1897

Slow consumers have high MaxProcessingTime. If the subscription to all topics does not happen in the first flush to bc.newSubscriptons then due to high MaxProcessingTime the next subscription will only happen after 2 ticks of MaxProcessingTime. This makes the subscription very slow.

Instead if we flush based on time, and the tick interval is configurable, then this works for both slow and fast consumer use cases. Works for consumer having short and longMaxProcessingTime

Please share a better approach to solve this if there can be.

cc @dim

@alok87 alok87 requested a review from bai as a code owner March 12, 2021 04:26
@ghost ghost added the cla-needed label Mar 12, 2021
@alok87 alok87 changed the title Attemp to solve the slow consumer, subscription problem Attempt to solve the slow consumer, subscription problem Mar 12, 2021
@alok87 alok87 changed the title Attempt to solve the slow consumer, subscription problem Solve the slow consumer, subscription problem Mar 12, 2021
@ghost ghost removed the cla-needed label Mar 12, 2021
alok87 added a commit to practo/tipoca-stream that referenced this pull request Mar 12, 2021
Sarama updated from 1.27.2 to master
Add changes to use ticker instead IBM/sarama#1899 to solve #1897

Fixes IBM/sarama#1897
Fixes #160
@bai bai requested review from d1egoaz and dnwe March 12, 2021 05:23
@alok87
Copy link
Author

alok87 commented Mar 15, 2021

@bai Please suggest if you agree with the solution. If yes, please suggest where can I add the flushTicker config to make it configurable for different use cases...

If you have a better solution in mind, please suggest.

@dnwe
Copy link
Collaborator

dnwe commented Mar 15, 2021

@alok87 hmm, I read back through the linked issue and the debugging output that you posted a few times and looked over the PR, but I still don't understand how the proposed change would be beneficial in your scenario

I'll admit that I'm not familiar with this subscriptionManager aspect of the consumer code and I probably need to spend some time familiarising myself with what is currently there as well.

Can you describe how we might go about reproducing your issue locally?

@alok87
Copy link
Author

alok87 commented Mar 15, 2021

@dnwe hey thanks for the response.

The issue happens only when the MaxProcessingTime for the consumer group is high (10mins for us).
We are using the fork of sarama in Production with this change and things are working for us.

Reproduced

I took out the part of the code to make it easier to debug and reproduce the issue. Here is the code issue.go https://gist.github.com/alok87/5e9f960a7d376d72d8137793bdc9ad12

$ go run issue.go
I0315 17:40:06.656947   83310 issue.go:120] subscriptionConsumer(2) waiting
I0315 17:40:06.656972   83310 issue.go:67] subscriptionManager(2) STARTING FOR LOOP
I0315 17:40:06.657123   83310 issue.go:101] subscriptionManager(2) received ts.topic-0, buffer(1)
I0315 17:40:06.657131   83310 issue.go:91] subscriptionManager(2) bc.wait() stopped
I0315 17:40:06.657136   83310 issue.go:122] subscriptionConsumer(2) resuming...
I0315 17:40:06.657142   83310 issue.go:145] added subscription to 2/ts.topic-0
I0315 17:40:06.657146   83310 issue.go:136] subscriptionConsumer(2) waiting for acks wait()
I0315 17:40:06.657154   83310 issue.go:82] subscriptionManager(2) flushed buffer(1), buffer set nil
I0315 17:40:07.657896   83310 issue.go:80] subscriptionManager(2) received ts.topic-1, buffer(2)
I0315 17:40:08.660548   83310 issue.go:80] subscriptionManager(2) received ts.topic-2, buffer(3)
I0315 17:40:09.662039   83310 issue.go:80] subscriptionManager(2) received ts.topic-3, buffer(4)
I0315 17:40:10.664217   83310 issue.go:80] subscriptionManager(2) received ts.topic-4, buffer(5)
I0315 17:40:11.665924   83310 issue.go:80] subscriptionManager(2) received ts.topic-5, buffer(6)
I0315 17:40:12.668347   83310 issue.go:80] subscriptionManager(2) received ts.topic-6, buffer(7)
I0315 17:40:13.672909   83310 issue.go:80] subscriptionManager(2) received ts.topic-7, buffer(8)
I0315 17:40:14.674409   83310 issue.go:80] subscriptionManager(2) received ts.topic-8, buffer(9)
I0315 17:40:15.675832   83310 issue.go:80] subscriptionManager(2) received ts.topic-9, buffer(10)
I0315 17:40:16.677863   83310 issue.go:62] producing done

In this example 10 topics needed subscription. As soon as the first message(topic) was sent to bc.input the buffer got flushed for this single message.

Then the subscriptionConsumer routine got stuck at bc.acks.Wait since the consumer group is having high MaxProcessingTime. This happens since the feeder loop sends acks.Done only when the ticker tick happens twice (20mins for us).

So till 20mins nothing new gets subscribed even when the buffer has many pending topics ready for subscriptions.

Flushing at 5 second interval started subscribing 200 topics for us in first flush, which works for us. But might need a different value for different use cases. But then I feel there can be a better way around this issue. May be we can use buffered channel for bc.newSubscriptions.

Please let me know if you need more details.

@alok87
Copy link
Author

alok87 commented Mar 31, 2021

Possible to meet over zoom for this or do we have some team meeting for this project?
Wanted to explain the problem and discuss it. It is important to us.

@dnwe
Copy link
Collaborator

dnwe commented May 5, 2021

👋🏻 sorry for not having got back to you on this, it is still on my TODO to revisit this PR soon

alok87 added a commit to practo/tipoca-stream that referenced this pull request Jun 5, 2021
Sarama updated from 1.27.2 to master
Add changes to use ticker instead IBM/sarama#1899 to solve #1897

Fixes IBM/sarama#1897
Fixes #160
alok87 added a commit to practo/tipoca-stream that referenced this pull request Jun 7, 2021
Sarama updated from 1.27.2 to master
Add changes to use ticker instead IBM/sarama#1899 to solve #1897

Fixes IBM/sarama#1897
Fixes #160
alok87 added a commit to practo/tipoca-stream that referenced this pull request Jun 17, 2021
Sarama updated from 1.27.2 to master
Add changes to use ticker instead IBM/sarama#1899 to solve #1897

Fixes IBM/sarama#1897
Fixes #160
@bundleman
Copy link

It's doesn't work. Work this approach #1608

@letian0805
Copy link

Hi, any update?

@bai
Copy link
Contributor

bai commented Jan 12, 2022

@dnwe Do you think it's something we want to get merged? Asking since this PR's been open for a while.

@dnwe
Copy link
Collaborator

dnwe commented Jan 12, 2022

@bai based on the feedback above I think this approach didn't appear to have solved the problem that other users were seeing and I believe has been superceded by the approach mentioned in point 2. of #1608 which seemed to have been more widely confirmed as a solution.

@alok87 shall we close this PR and await the outcome of #1608?

@alok87
Copy link
Author

alok87 commented Jan 12, 2022

Sure.

But the issue is a real problem that we should look at. Not a concern for us now but the problem we found out since we went that path once.

@alok87 alok87 closed this Jan 12, 2022
@dnwe
Copy link
Collaborator

dnwe commented Jan 12, 2022

@alok87 would you able to test the branch on #2109 and confirm if that reworked subscription manager solves the issue for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow consumers, subscription stuck problem
5 participants