Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: prevent data race in balance strategy #2453

Merged
merged 1 commit into from
Mar 27, 2023
Merged

fix: prevent data race in balance strategy #2453

merged 1 commit into from
Mar 27, 2023

Conversation

napallday
Copy link
Contributor

@napallday napallday commented Mar 10, 2023

For every BalanceStrategy, it may be shared across different consumer groups in one instance. Considering every built-in balance strategy has several fields in the struct as shown in the below snippet, data race issues may happen when several consumer group rebalances simultaneously.

type balanceStrategy struct {
	coreFn func(plan BalanceStrategyPlan, memberIDs []string, topic string, partitions []int32)
	name   string
}
type stickyBalanceStrategy struct {
	movements partitionMovements
}

Even worse, I think this issue could cause

  • panic
  • some topics/partitions are not subscribed by the corresponding consumers, because the corresponding fields in the struct may be modified by other consumer groups

To verify this issue, a unit test was written, which fails when -race detection is enabled.

func Test_stickyBalanceStrategy_Plan_data_race(t *testing.T) {
	for i := 0; i < 1000; i++ {
		go func(bs BalanceStrategy) {
			members := map[string]ConsumerGroupMemberMetadata{
				"m1": {
					Version: 3,
					Topics:  []string{"topic"},
				},
			}
			topics := map[string][]int32{
				"topic": {0, 1, 2},
			}
			_, _ = bs.Plan(members, topics)
		}(BalanceStrategySticky)
	}
}

To avoid this problem, it may be best not to use a singleton for each and every of BalanceStrategy. Instead, newly-added functions like NewBalanceStrategyRange() BalanceStrategy are recommended so that every time an exclusive variable is assigned to each consumer group.

Fixes #2422

@napallday
Copy link
Contributor Author

napallday commented Mar 14, 2023

just checked the failed integration tests, it shouldn't be caused by the code changes here.

ProducerTxnStateInError|ProducerTxnStateFatalError
    functional_producer_test.go:140: 
        	Error Trace:	/home/runner/work/sarama/sarama/functional_producer_test.go:140
        	Error:      	Received unexpected error:
        	            	kafka server: The broker is still loading offsets after a leader change for that offset's topic partition
        	Test:       	TestFuncTxnCommitNoMessages

Copy link
Collaborator

@dnwe dnwe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the changes LGTM

@dnwe dnwe added the fix label Mar 23, 2023
@dnwe dnwe changed the title fix: data race in balance strategy fix: prevent data race in balance strategy Mar 23, 2023
@dnwe dnwe merged commit 9127f1c into IBM:main Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BalanceStrategySticky panic with concurrent map writes error
2 participants