Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sporadic disconnects (close events) at start up #96

Open
tikh opened this issue Jun 4, 2016 · 8 comments
Open

Sporadic disconnects (close events) at start up #96

tikh opened this issue Jun 4, 2016 · 8 comments

Comments

@tikh
Copy link

tikh commented Jun 4, 2016

I have an issue with starting up multiple bot clients after application start up. Sometimes, "CLOSE" event is being fired for many clients and regular reconnection logic doesn't help (while it works fine in event of temporary network disconnect). "CLOSE" events have "code" value 1006.
It doesn't happen every time, and it have something to do with time spent in callbacks, I suppose.

A snippet used to reproduce the issue - https://gist.github.com/tikh/50d5cf34481330759a736aa351e3ee2e. Simply adding a "sleep" to "hello" callback makes the issue happen. But even without sleep this happens sometimes.

In the example, faye-websocket gem was used and 300 connections to different Slack teams.

@dblock dblock changed the title Sporadic disconnects at start up Sporadic disconnects (close events) at start up Jun 5, 2016
@dblock
Copy link
Collaborator

dblock commented Jun 5, 2016

This is consistent with the behavior observed with faye-websocket. We try to open a lot of connections, and the client can't process them fast enough, causing these close events, a reconnect and finally a success. With many connections it happens a lot more often, and can throw a service with 300 teams in a loop. It's more frequent if you make the initial connection under EM.defer, but that hits faye/websocket-driver-ruby#38.

You definitely want to handle the close event and try to reconnect, like https://github.com/dblock/slack-ruby-bot-server/blob/8ba27e827cfcc75bf6d015e5117c12c72fa33096/lib/slack-ruby-bot-server/server.rb#L15.

I had switched to using celluloid-io for all of my projects, which doesn't do the rtm.start connection in parallel making it a little slower to start (see #87), but works very consistently.

@tikh
Copy link
Author

tikh commented Jun 6, 2016

The start up issue was resolved after switching to celluloid-io gem. However, I wanted to test a reconnection logic and found out that close event never happens (at least in a few first minutes). I tried two things - turning off wi-fi on my laptop and deleting an integration in Slack. I expected to have close event earlier as WebSockets ping interval is 30 seconds by default. Also, I tried to do active periodic pinging and got this exception:

[2016-06-06 12:03:13.276][PID=21374] ERROR Actor crashed!
Errno::ETIMEDOUT: Operation timed out
    /Users/tikh/.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/openssl/buffering.rb:178:in `sysread_nonblock'
    /Users/tikh/.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/openssl/buffering.rb:178:in `read_nonblock'
    /Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-io-0.17.3/lib/celluloid/io/stream.rb:42:in `block in sysread'
    /Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-io-0.17.3/lib/celluloid/io/stream.rb:390:in `synchronize'
    /Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-io-0.17.3/lib/celluloid/io/stream.rb:40:in `sysread'
    /Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-io-0.17.3/lib/celluloid/io/stream.rb:129:in `readpartial'
    /Users/tikh/.rvm/gems/ruby-2.3.0/gems/slack-ruby-client-0.7.2/lib/slack/real_time/concurrency/celluloid.rb:42:in `read'
    /Users/tikh/.rvm/gems/ruby-2.3.0/gems/slack-ruby-client-0.7.2/lib/slack/real_time/concurrency/celluloid.rb:36:in `block in run_loop'
    /Users/tikh/.rvm/gems/ruby-2.3.0/gems/slack-ruby-client-0.7.2/lib/slack/real_time/concurrency/celluloid.rb:36:in `loop'
    /Users/tikh/.rvm/gems/ruby-2.3.0/gems/slack-ruby-client-0.7.2/lib/slack/real_time/concurrency/celluloid.rb:36:in `run_loop'
    /Users/tikh/.rvm/gems/ruby-2.3.0/gems/slack-ruby-client-0.7.2/lib/slack/real_time/concurrency/celluloid.rb:29:in `connect!'
    /Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `public_send'
    /Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `dispatch'
    /Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-0.17.3/lib/celluloid/call/sync.rb:16:in `dispatch'
    /Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:50:in `block in dispatch'
    /Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:76:in `block in task'
    /Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-0.17.3/lib/celluloid/actor.rb:339:in `block in task'
    /Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-0.17.3/lib/celluloid/task.rb:44:in `block in initialize'
    /Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-0.17.3/lib/celluloid/task/fibered.rb:14:in `block in create'

@dblock
Copy link
Collaborator

dblock commented Jun 6, 2016

Hmmm... check out celluloid/celluloid-io#176 and https://github.com/dblock/slack-ruby-client/issues/90, does this explain any of this?

@tikh
Copy link
Author

tikh commented Jun 6, 2016

Actually if Slack integration is deleted, 'close' event happens correctly. When I was trying to simulate connection problems by turning off Wi-Fi and had periodic pings, I got exception above. Probably, Errno::ETIMEDOUT exceptions should be added to the rescue block where you already handle EOFError (https://github.com/dblock/slack-ruby-client/blob/master/lib/slack/real_time/concurrency/celluloid.rb#L38). Does it sound right? Or erroneous exceptions situations are supposed to be handle in different places?

@dblock
Copy link
Collaborator

dblock commented Jun 6, 2016

Yes, that sounds just right! Maybe it should be a whole other range of exceptions, would appreciate some research too in other network conditions. Would love a PR!

@dblock
Copy link
Collaborator

dblock commented Jun 7, 2016

I think another one is Errno:EPIPE and Errno::ECONNRESET:

E, [2016-06-07T19:05:15.066577 #154] ERROR -- : Actor crashed!
Errno::EPIPE: Broken pipe
    /app/vendor/ruby-2.2.4/lib/ruby/2.2.0/openssl/buffering.rb:383:in `syswrite_nonblock'
    /app/vendor/ruby-2.2.4/lib/ruby/2.2.0/openssl/buffering.rb:383:in `write_nonblock'
    /app/vendor/bundle/ruby/2.2.0/gems/celluloid-io-0.17.3/lib/celluloid/io/stream.rb:62:in `block in syswrite'
    /app/vendor/bundle/ruby/2.2.0/gems/celluloid-io-0.17.3/lib/celluloid/io/stream.rb:390:in `synchronize'
    /app/vendor/bundle/ruby/2.2.0/gems/celluloid-io-0.17.3/lib/celluloid/io/stream.rb:59:in `syswrite'
    /app/vendor/bundle/ruby/2.2.0/gems/celluloid-io-0.17.3/lib/celluloid/io/stream.rb:356:in `do_write'
    /app/vendor/bundle/ruby/2.2.0/gems/celluloid-io-0.17.3/lib/celluloid/io/stream.rb:250:in `write'
    /app/vendor/bundle/ruby/2.2.0/gems/slack-ruby-client-0.7.3/lib/slack/real_time/concurrency/celluloid.rb:56:in `write'
    /app/vendor/bundle/ruby/2.2.0/gems/websocket-driver-0.6.3/lib/websocket/driver/hybi.rb:232:in `send_frame'
    /app/vendor/bundle/ruby/2.2.0/gems/websocket-driver-0.6.3/lib/websocket/driver/hybi.rb:193:in `frame'
    /app/vendor/bundle/ruby/2.2.0/gems/websocket-driver-0.6.3/lib/websocket/driver/hybi.rb:156:in `close'
    /app/vendor/bundle/ruby/2.2.0/gems/slack-ruby-client-0.7.3/lib/slack/real_time/concurrency/celluloid.rb:45:in `close'
    /app/vendor/bundle/ruby/2.2.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `public_send'
    /app/vendor/bundle/ruby/2.2.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `dispatch'
    /app/vendor/bundle/ruby/2.2.0/gems/celluloid-0.17.3/lib/celluloid/call/sync.rb:16:in `dispatch'
    /app/vendor/bundle/ruby/2.2.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:50:in `block in dispatch'
    /app/vendor/bundle/ruby/2.2.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:76:in `block in task'
    /app/vendor/bundle/ruby/2.2.0/gems/celluloid-0.17.3/lib/celluloid/actor.rb:339:in `block in task'
    /app/vendor/bundle/ruby/2.2.0/gems/celluloid-0.17.3/lib/celluloid/task.rb:44:in `block in initialize'
    /app/vendor/bundle/ruby/2.2.0/gems/celluloid-0.17.3/lib/celluloid/task/fibered.rb:14:in `block in create'

@dblock
Copy link
Collaborator

dblock commented Jun 14, 2016

@dblock dblock closed this as completed Aug 30, 2016
@dblock dblock reopened this Aug 30, 2016
@dblock
Copy link
Collaborator

dblock commented Aug 30, 2016

Closed the wrong issue, reopening ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants