Rework how the client connects to brokers. #10

eapache · 2013-08-14T02:01:27Z

@burke @fw42 Thorough review please - I want this concurrent locking pattern to make sense to somebody else before I merge it.

CC @tobi

Fixes #9.

This ended up being more complicated than I had hoped and touched several
different areas. TL;DR is that we now connect to the other brokers in the
cluster asynchronously. Errors connecting only show up when somebody tries to
use that broker.

This is better than the old behaviour since it means that if some brokers in a
cluster go down but the topics we care about are still available, we just keep
going instead of blowing up for no reason.

The complicated part is that simply calling go broker.Connect() doesn't do
what we want, so I had to write a broker.AsyncConnect(). The problem occurs if
you've got code like this:

go broker.Connect()
// do some stuff
broker.SendSomeMessage()

What can happen is that SendSomeMessage can be run before the Connect
goroutine ever gets scheduled, in which case SendSomeMessage will simply return
NotConnected. The desired behaviour is that SendSomeMessage waits for Connect
to finish, which means AsyncConnect has to synchronously take the broker lock
before it launches the asynchronous connect goroutine. Lots of fun.

And bonus change in this commit: rather than special-casing leader == -1 in
client.cachedLeader and adding a big long comment to the LEADER_NOT_AVAILABLE
case explaining the fallthrough statement, just delete that partition from the
hash. So much easier to follow, I must have been on crack when I wrote the old
way.

Fixes #9. This ended up being more complicated than I had hoped and touched several different areas. TL;DR is that we now connect to the other brokers in the cluster asynchronously. Errors connecting only show up when somebody tries to use that broker. This is better than the old behaviour since it means that if some brokers in a cluster go down but the topics we care about are still available, we just keep going instead of blowing up for no reason. The complicated part is that simply calling `go broker.Connect()` doesn't do what we want, so I had to write a `broker.AsyncConnect()`. The problem occurs if you've got code like this: go broker.Connect() // do some stuff broker.SendSomeMessage() What can happen is that SendSomeMessage can be run before the Connect() goroutine ever gets scheduled, in which case SendSomeMessage will simply return NotConnected. The desired behaviour is that SendSomeMessage waits for Connect() to finish, which means Connect() has to *synchronously* take the broker lock before it launches the asynchronous connect call. Lots of fun. And bonus change in this commit: rather than special-casing leader == -1 in `client.cachedLeader` and adding a big long comment to the LEADER_NOT_AVAILABLE case explaining the fallthrough statement, just delete that partition from the hash. So much easier to follow, I must have been on crack when I wrote the old way.

tobi · 2013-08-14T16:28:18Z

It looks like a good change. Afaik the db.Sql package works just like this.
In that case I think they use sql.Open() instead of AsyncConnect so it may
make sense to match their pattern.

tobi
CEO Shopify

On Tue, Aug 13, 2013 at 7:01 PM, Evan Huus [email protected] wrote:

@burke https://github.com/burke @fw42 https://github.com/fw42Thorough review please - I want this concurrent locking pattern to make
sense to somebody else before I merge it.

CC @tobi https://github.com/tobi

Fixes #9 #9.

This ended up being more complicated than I had hoped and touched several
different areas. TL;DR is that we now connect to the other brokers in the
cluster asynchronously. Errors connecting only show up when somebody tries
to
use that broker.

This is better than the old behaviour since it means that if some brokers
in a
cluster go down but the topics we care about are still available, we just
keep
going instead of blowing up for no reason.

The complicated part is that simply calling go broker.Connect() doesn't do
what we want, so I had to write a broker.AsyncConnect(). The problem
occurs if
you've got code like this:

go broker.Connect()
// do some stuff
broker.SendSomeMessage()

What can happen is that SendSomeMessage can be run before the Connect
goroutine ever gets scheduled, in which case SendSomeMessage will simply
return
NotConnected. The desired behaviour is that SendSomeMessage waits for
Connect
to finish, which means AsyncConnect has to synchronously take the
broker lock
before it launches the asynchronous connect goroutine. Lots of fun.

And bonus change in this commit: rather than special-casing leader == -1in
client.cachedLeader and adding a big long comment to the
LEADER_NOT_AVAILABLE
case explaining the fallthrough statement, just delete that partition from
the
hash. So much easier to follow, I must have been on crack when I wrote the
old

way.

You can merge this Pull Request by running

git pull https://github.com/Shopify/sarama handle_unreachable_brokers

Or view, comment on, or merge it at:

#10
Commit Summary

Rework how the client connects to brokers.

File Changes

M broker.gohttps://github.com/Rework how the client connects to brokers. #10/files#diff-0(32)

M client.gohttps://github.com/Rework how the client connects to brokers. #10/files#diff-1(32)

Patch Links:

https://github.com/Shopify/sarama/pull/10.patch

https://github.com/Shopify/sarama/pull/10.diff

burke · 2013-08-14T16:30:35Z

broker.go

 	if b.conn != nil {
 		return AlreadyConnected
 	}
+	b.conn_err = nil

 	addr, err := net.ResolveIPAddr("ip", b.host)


Consider that you could do addr, b.conn_err = .... Just a thought; feel free to ignore if you have some reason not to do that.

burke · 2013-08-14T16:33:59Z

Change looks good to me. Nice work

eapache · 2013-08-14T16:45:56Z

@tobi does db.Sql not provide a synchronous open then? That seems an odd choice.

Nevermind, I RTFM, that's a pattern worth matching.

Replaces Connect and AsyncConnect with Open and Connected

eapache · 2013-08-14T16:58:39Z

ping @burke for a quick re-review of the new pattern please?

burke · 2013-08-14T18:23:32Z

Looks great.

Rework how the client connects to brokers.

burke reviewed Aug 14, 2013
View reviewed changes

No need for a separate err variable, thanks Burke

46629e3

Match the golang database pattern

934c568

Replaces Connect and AsyncConnect with Open and Connected

tweak broker example

cb8190c

eapache added a commit that referenced this pull request Aug 14, 2013

Merge pull request #10 from Shopify/handle_unreachable_brokers

7626e3d

Rework how the client connects to brokers.

eapache merged commit 7626e3d into master Aug 14, 2013

eapache deleted the handle_unreachable_brokers branch August 14, 2013 19:05

eapache mentioned this pull request Aug 15, 2013

Better Broker Connection Management #15

Closed

cep21 mentioned this pull request May 14, 2015

panic: sync: negative WaitGroup counter #449

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework how the client connects to brokers. #10

Rework how the client connects to brokers. #10

eapache commented Aug 14, 2013

tobi commented Aug 14, 2013

way.

burke Aug 14, 2013

burke commented Aug 14, 2013

eapache commented Aug 14, 2013

eapache commented Aug 14, 2013

burke commented Aug 14, 2013

Rework how the client connects to brokers. #10

Rework how the client connects to brokers. #10

Conversation

eapache commented Aug 14, 2013

tobi commented Aug 14, 2013

way.

burke Aug 14, 2013

Choose a reason for hiding this comment

burke commented Aug 14, 2013

eapache commented Aug 14, 2013

eapache commented Aug 14, 2013

burke commented Aug 14, 2013