Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential race/deadlock between broker error and AsyncClose #475

Closed
eapache opened this issue Jun 17, 2015 · 4 comments
Closed

Potential race/deadlock between broker error and AsyncClose #475

eapache opened this issue Jun 17, 2015 · 4 comments
Labels

Comments

@eapache
Copy link
Contributor

eapache commented Jun 17, 2015

I was inspecting the consumer code as part of #474 and I realized that there is a potential deadlock condition in the very rare case where a broker returns an error (such as ErrOffsetOutOfRange) at the same time that the user calls AsyncClose (or just Close).

Both events result in a value being written to child.dying which has a buffer for only one event. If the AsyncClose write succeeds, then the responseFeeder write (triggered by the broker error) will block; since the goroutine responsible for reading from that channel is currently waiting on responseFeeder to unblock it, I believe this deadlocks.

cc @Shopify/kafka very unlikely (and entirely theoretical at this point) but I'd appreciate if somebody could verify my logic

@wvanbergen
Copy link
Contributor

Your sequence of events seems plausible to me. Maybe things like this will be a lot easier to reason about using a sync.Once guard around the shutdown logic? That should also fix this I think.

@eapache
Copy link
Contributor Author

eapache commented Jun 17, 2015

?

the problem is that multiple things can legitimately write to dying, not that one of those writes is illegitimate

@wvanbergen
Copy link
Contributor

Why would we ever want to write more than once to dying? We can only shut down a consumer once.

@wvanbergen
Copy link
Contributor

Never mind; we're not just closeing the channel, but also writing errors to it.

eapache added a commit that referenced this issue Jul 15, 2015
In the case when the user calls `AsyncClose` at the same instant as the broker
leading the partition returns an error.

Fixes bug #475.
eapache added a commit that referenced this issue Jul 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants