Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consumer: don't run redistribute without connections #132

Merged
merged 4 commits into from
Apr 8, 2015

Conversation

mreiferson
Copy link
Member

This fixes #130 and #131.

The scenario is:

  • single connection
  • in backoff
  • resumes with RDY 1
  • connection dies
  • never sends RDY again

We were (correctly) setting the redistribute flag, but it was attempting
to redistribute without having any connections.

This fix ensures that if an external heuristic sets the redistribute flag
it won't run proceed unless there are connections.

RFR @jehiah

/cc @twmb @zhangdaoling

@mreiferson
Copy link
Member Author

FYI the block of code in question that sets needRDYRedistributed is:

https://github.com/bitly/go-nsq/blob/master/consumer.go#L698-L706

Another, perhaps more reliable way to approach this would be to convert those rules into active rules that are continually evaluated when redistributeRDY() runs.

Any suggestions @jehiah?

@@ -114,6 +114,9 @@ type Config struct {
LookupdPollInterval time.Duration `opt:"lookupd_poll_interval" min:"5s" max:"5m" default:"60s"`
LookupdPollJitter float64 `opt:"lookupd_poll_jitter" min:"0" max:"1" default:"0.3"`

// Duration between reconnect attempts when connecting directly to nsqd (i.e not using nsqlookupd)
ReconnectInterval time.Duration `opt:"reconnect_interval" min:"0" max:"1m" default:"15s"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike having separate variables for the reconnect interval using nsqd directly vs using lookupd. It's a shame the var for lookupd is named as such because this is probably a better name.

do you feel comfortable just re-using LookupdPollInterval ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll buy it.

@jehiah
Copy link
Member

jehiah commented Apr 8, 2015

i dig this change. I don't think it's worth trying to rewrite this code to tackle these situations differently (or at least i don't have a sufficiently clear idea of what that would look like).

on the plus side, our test coverage is getting better =)

@mreiferson
Copy link
Member Author

also, FYI this new test failed against at least the last 2 stable releases.... 😦

@mreiferson
Copy link
Member Author

I don't think it's worth trying to rewrite this code to tackle these situations differently (or at least i don't have a sufficiently clear idea of what that would look like).

I think it's worth it if only because it's the only spot where RDY redistribution is enabled outside of the actual function (which is why it was susceptible to this bug in the first place).

Whether we fix it now is a different question.

@mreiferson
Copy link
Member Author

@jehiah updated (sorry for squashing)

jehiah added a commit that referenced this pull request Apr 8, 2015
consumer: don't run redistribute without connections
@jehiah jehiah merged commit a76e331 into nsqio:master Apr 8, 2015
@mreiferson mreiferson deleted the deadlocks_132 branch April 8, 2015 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nsq_to_nsq: producer never reconnects in some cases
2 participants