Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic via nsq.Producer #45

Merged
merged 1 commit into from
Jun 15, 2014
Merged

Panic via nsq.Producer #45

merged 1 commit into from
Jun 15, 2014

Conversation

mreiferson
Copy link
Member

I'm writing a message dispatcher based on nsq. I'm running into a panic that seems to be originating within the go-nsq library (or, so says the stack trace). Any ideas? The panic occurs on this line.
screen shot 2014-06-13 at 3 33 39 pm
The test case in question can be found here.

Try it for yourself:
Start nsqd running (locally), then:

go get github.com/philhofer/fluxlog
cd $GOPATH/github.com/philhofer/fluxlog
go test -v -tags 'nsq'

(I'm compiling with go1.3rc1, because my package depends on sync.Pool.)

The test tries to write 10 messages to a topic called "test" - which seems to work ok. Then the *nsq.Producer code tries to close the tcp connection, and everything goes to hell.

This may be an issue on my end, but if it is, I can't find it.

Thanks in advance.

@mreiferson
Copy link
Member

I'll take a look - thanks for the report @philhofer

@mreiferson mreiferson added the bug label Jun 13, 2014
@mreiferson
Copy link
Member

@philhofer - whats the revision of the checkout for the go-nsq you're compiling against?

@philhofer
Copy link
Contributor Author

@mreiferson I pulled the latest master as of a few hours ago.

@mreiferson
Copy link
Member

I'm trying to reproduce but I'm running into issues compiling fluxlog:

mreiferson@AIRSNAKES ~/dev/src/github.com/philhofer/fluxlog](master)$ go test -v -a
# github.com/philhofer/fluxlog
./init.go:120: cannot use l.list.Read() (type gringo.Payload) as type *capn.Segment in assignment
./init.go:180: cannot use seg (type *capn.Segment) as type gringo.Payload in argument to l.list.Write
./init.go:189: cannot use seg (type *capn.Segment) as type gringo.Payload in argument to l.list.Write
./types_test.go:42: cannot use msg (type *capn.Segment) as type gringo.Payload in argument to g.Write
./types_test.go:49: cannot use g.Read() (type gringo.Payload) as type *capn.Segment in assignment
FAIL    github.com/philhofer/fluxlog [build failed]

@philhofer
Copy link
Contributor Author

@mreiferson My bad - I hadn't pushed changes for a dependency. If you go get -u it should compile ok now.

@mreiferson
Copy link
Member

@philhofer - it compiles, but I can't reproduce.

The test deadlocks because your producer loop sits here - https://github.com/philhofer/fluxlog/blob/master/init.go#L119-L120. When your test goes to Close() the logger, it blocks on https://github.com/philhofer/fluxlog/blob/master/init.go#L195 because the aforementioned loop is sitting in the default case waiting for an entry on the queue.

There may indeed be some issue to resolve with the go-nsq cleanup but I suspect it's related to trying to publish after calling Stop on the Producer.

Can you reliably reproduce the error without changing the code?

@mreiferson
Copy link
Member

@philhofer - OK, I was able to reproduce some issues when removing all the sleep in your test. There was a race between connecting and closing in quick succession. I've attached a commit that fixes it to this issue (RFR @jehiah)

Also, after fixing the go-nsq issue, it uncovered a problem with your ordering in Logger.Close - you need to close the done channel after stopping the producer, not before (because it is still trying to flush pending async publishes, so it fails "sending on a closed channel"). Also, because you're publishing asynchronously, your test obviously needs to account for that (which is probably why you had sleeps in there in the first place).

Thanks again for the report.

@philhofer
Copy link
Contributor Author

@mreiferson Thanks, I appreciate it!

jehiah added a commit that referenced this pull request Jun 15, 2014
@jehiah jehiah merged commit 394ebba into nsqio:master Jun 15, 2014
@mreiferson mreiferson deleted the close_race_45 branch July 24, 2014 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants