Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Websocket state errors with production usage #550

Closed
4 of 9 tasks
bmajz opened this issue May 7, 2018 · 14 comments · Fixed by #555
Closed
4 of 9 tasks

Websocket state errors with production usage #550

bmajz opened this issue May 7, 2018 · 14 comments · Fixed by #555
Labels
bug M-T: A confirmed bug report. Issues are confirmed when the reproduction steps are documented

Comments

@bmajz
Copy link

bmajz commented May 7, 2018

Description

Using this in production, we often get websocket state errors that lead to us having to take the connection down. Here are is an example (not the frames may not be in order due to how our logs are processed):

[ERROR] @slack/client:RTMClient A websocket error occurred: Unexpected server response: 503 
Error: Unhandled event 'websocket close' in state 'connecting'. 
    at /home/node_modules/finity/lib/core/HierarchicalStateMachine.js:79:33 
    at StateMachine.handleUnhandledEvent (/home/node_modules/finity/lib/core/StateMachine.js:76:13) 
    at TaskScheduler.execute (/home/node_modules/finity/lib/core/TaskScheduler.js:29:7) 
    at WebSocket.websocket.addEventListener.event (/home/node_modules/@slack/client/dist/RTMClient.js:391:77) 
    at HierarchicalStateMachine.handle (/home/node_modules/finity/lib/core/HierarchicalStateMachine.js:72:24) 
    at WebSocket.emit (events.js:191:7) 
    at TaskScheduler.enqueue (/home/node_modules/finity/lib/core/TaskScheduler.js:19:12) 
    at emitTwo (events.js:106:13) 
    at WebSocket.onClose (/home/node_modules/ws/lib/event-target.js:124:16) 
    at WebSocket.finalize (/home/node_modules/ws/lib/websocket.js:187:12) 

We are also regularly seeing this error appear in logs:
[ERROR] @slack/client:KeepAlive Unhandled error: Failed to send message on websocket: WebSocket is not open: readyState 2 (CLOSING). Please report to @slack/client package maintainers.

What type of issue is this? (place an x in one of the [ ])

  • bug
  • enhancement (feature request)
  • question
  • documentation related
  • testing related
  • discussion

Requirements (place an x in each of the [ ])

  • I've read and understood the Contributing guidelines and have done my best effort to follow them.
  • I've read and agree to the Code of Conduct.
  • I've searched for any related issues and avoided creating a duplicate issue.

Bug Report

Filling out the following details about bugs will help us solve your issue sooner.

Reproducible in:

@slack/client version: 4.2.0

node version: v6.10.3

OS version(s): alpine linux 3.7

Steps to reproduce:

Haven't been able to figure out specific repro steps unfortunately. Just these observations from production logs

Expected result:

No error

Actual result:

Errors

Attachments:

N/A

@aoberoi
Copy link
Contributor

aoberoi commented May 8, 2018

@bmajz thanke for reporting. this seems like a tricky one to reproduce, but the 503 was a really good hint. @shanedewael and i spoke about possible causes and we identified a few things that should be handled better. i'll let him weigh in on the changes.

@shaydewael
Copy link
Contributor

@bmajz We saw the errors you received as two different issues (both of which should be addressed). For the unhandled websocket close event, it seems related to #537. In the state machine, it looks like the websocket close event isn't properly handled in the authenticated substate of connecting. It will (hopefully) be a pretty simple fix.

When we looked into the second error you saw in KeepAlive, it seemed as if the stop() function wasn't clearing state properly. This seems less critical than the state machine bug, but should also be a fairly quick fix.

@bmajz
Copy link
Author

bmajz commented May 9, 2018

Sweet — let me know when we’re good to retest! For now we jumped back to v3

@aoberoi aoberoi added the bug M-T: A confirmed bug report. Issues are confirmed when the reproduction steps are documented label May 11, 2018
@anfilat
Copy link

anfilat commented May 12, 2018

it still doesn't work with 4.2.2. I wrote simplest rtm bot

const { RTMClient } = require('@slack/client');

const SLACK_BOT_TOKEN = 'xoxb-...';

const rtm = new RTMClient(SLACK_BOT_TOKEN, {logLevel: 'debug'});
rtm.start();

rtm.on('message', (event) => {
	console.log(event);
});

and it cannot connect to Slack servers. Start log -

anfilat@afcomp:~/slack/node-js-getting-started$ node ind.js 
[DEBUG] @slack/client:WebClient initialized
[DEBUG] @slack/client:RTMClient transitioning to state: disconnected
[DEBUG] @slack/client:RTMClient pausing outgoing event queue
[DEBUG] @slack/client:RTMClient initialized
[DEBUG] @slack/client:RTMClient start()
[DEBUG] @slack/client:RTMClient transitioning to state: connecting
[DEBUG] @slack/client:RTMClient transitioning to state: connecting:authenticating
[DEBUG] @slack/client:WebClient apiCall() start
[DEBUG] @slack/client:WebClient request attempt
[DEBUG] @slack/client:RTMClient transitioning to state: connecting:authenticated
[ERROR] @slack/client:RTMClient A websocket error occurred: connect ETIMEDOUT 52.59.24.122:443
[DEBUG] @slack/client:RTMClient transitioning to state: connecting:reconnecting
anfilat@afcomp:~/slack/node-js-getting-started$ 

There is strange thing in transitioning to state: connecting:reconnecting I think there must be transitioning to state: reconnecting

@aoberoi aoberoi reopened this May 12, 2018
@aoberoi
Copy link
Contributor

aoberoi commented May 12, 2018

@anfilat the logging is definitely strange, good catch! did the application crash? did you terminate the app? i don't see an error in your output.

@aoberoi aoberoi added the needs info An issue that is claimed to be a bug and hasn't been reproduced, or otherwise needs more info label May 12, 2018
@aoberoi
Copy link
Contributor

aoberoi commented May 12, 2018

I think the above PR should actually fix this issue now.

@anfilat
Copy link

anfilat commented May 13, 2018

It's the full log. Application is self-terminated. There is something wrong with transitions from a child to a parent machine.
Yes, PR-560 is usefull fix

@dperez3
Copy link

dperez3 commented Jun 1, 2018

This issue might still exist in some form or another. I'm not sure if there's a better place to put this, but I ran into a similar issue with version 4.2.0.

I had multiple async uses of the following method... (in typescript)

    async awaitResponseAsync():Promise<string> {
        var promise = new Promise<string>(async (resolve, reject) => {
            let rtmClient = new RTMClient(BOT_TOKEN);

            let channelId = await this._communicationEntity.getChannelIdAsync();

            rtmClient.start(null);
            // wait for response
            rtmClient.once(`message`, (event) => {
                console.log(event);
                if(event.channel == channelId) {
                    this._responseText = event.text;
                    resolve(event.text);
                    rtmClient.disconnect();
                }
            });
            rtmClient.once(`error`, (err) => {
                console.error(event);
                reject(err);
                rtmClient.disconnect();
            });
        });

        return promise;
    }

It seemed as though the happenings in 1 instance of RTMClient affected all other instances resulting in the other RTMClient instances no longer invoking subscribers when 1 instance was done with. Changing it to use on subscriptions instead of once fixed it...

    async awaitResponseAsync():Promise<string> {
        var promise = new Promise<string>((resolve, reject) => {
            let rtmClient = new RTMClient(BOT_TOKEN);

            rtmClient.start(null);
            // wait for response
            rtmClient.on(`message`, async (event) => {

                let channelId = await this._communicationEntity.getChannelIdAsync();

                if(event.channel == channelId) {
                    console.log(`Response received for ${this.communicationEntity.name}`, event);
                    this._responseText = event.text;
                    resolve(event.text);
                    rtmClient.disconnect();
                }
            });
            rtmClient.on(`error`, (err) => {
                console.error(event);
                reject(err);
                rtmClient.disconnect();
            });
        });

        return promise;
    }

@aoberoi aoberoi removed the needs info An issue that is claimed to be a bug and hasn't been reproduced, or otherwise needs more info label Jun 3, 2018
@aoberoi
Copy link
Contributor

aoberoi commented Jun 3, 2018

@dperez3 i'm not sure I understand the problem you're describing. the two code snippets you provided seem identical (except for the additional console.log).

could you help me understand what the awaitResponseAsync() function is supposed to do? it seems like you might be trying to wait for a specific message you already sent to be acknowledged by listening to the RTM stream and waiting for it. if that's the case, i can offer a few improvements for you.

  1. if you sent the message using RTM, you should just await the Promise returned from RTMClient.sendMessage(). For example:

    (async () => {
      const rtm = new RTMClient(BOT_TOKEN);
      const response = await rtm.sendMessage('a simple message', channelId);
      // any code here executes after the message is sent
    }());
    

    the problem with how you've written the code currently is that the "message" event will trigger for all incoming messages that the bot can see. it seems like you're looking for a response to a very specific message in a specific channel. changing a .once() to a .on() is just helping your handler run on more messages, because the first one it sees might not be in the right channel.

  2. if you don't use the previous suggestion, it still looks like you can eliminate wrapping your code in a new Promise(). you're already in an async function, which will always return a Promise, even when you just return a string.

  3. in the next version, you don't need to pass in null as an argument to rtm.start(). we had an issue where we accidentally typed that argument as required instead of optional, but this will be resolved in the next release (likely to occur on Monday).

@dperez3
Copy link

dperez3 commented Jun 4, 2018

@aoberoi thank you for the quick response! I apologize, I did paste the wrong code in the second portion and I have updated it. However, you did seem to understand what I was trying to do pretty well anyway. The on vs once point you made makes total sense now. Of course once would result in it not receiving the intended message if it already heard an ignored message first.

This did break again today and ill respond to your other points now...

  1. Again, great point with on vs once. However, I'm not trying to await the http success of a message being sent to a channel. That happens elsewhere via WebClient. Here I'm trying to receive a response to the message from a user on a channel. Or is that what the code you provided would actually do? If so, that'd be great! If not, is there a way to just listen for messages on a specific channel? (I'm gonna try it anyway though).

  2. I suppose this is true if const response = await rtm.sendMessage('a simple message', channelId); returns a user response. I'm not sure how I could do this if I have to filter over the many possible messages I could receive until I get one from a specific channel though. Might be my inexperience with EventEmitters though.

  3. Great! I'll be installing the update asap.

Edit: The fundamental thing I'm trying to do is asynchronously ask channels (dm and otherwise) a question and wait for an answer to the question per channel.

@dperez3
Copy link

dperez3 commented Jun 4, 2018

Update

I've updated to ^4.2.2 and modified my code for a bit of an experiment...

    async awaitResponseAsync():Promise<string> {
        return await new Promise<string>(async (resolve, reject) => {
            let rtmClient = new RTMClient(BOT_TOKEN);

            rtmClient.start(null);

            try {
                for (var i = 0; i < 30; i++) {
                    let rtmClient = new RTMClient(BOT_TOKEN);
                    rtmClient.start(null);
                    rtmClient.disconnect();
                }
            } catch (err) {
                console.error('Error', err);
            }

            // wait for response
            rtmClient.on(`message`, async (event) => {

                let channelId = await this.communicationEntity.getChannelIdAsync();

                if(event.channel == channelId) {
                    console.log(`Response received for ${this.communicationEntity.name}`, event);
                    this._responseText = event.text;
                    resolve(event.text);
                    rtmClient.disconnect();
                }
            });
            rtmClient.on(`error`, (err) => {
                console.error(event);
                reject(err);
                rtmClient.disconnect();
            });
        });
    }

The bit surrounded by a try catch generates this error...

Error Error: Unhandled event 'explicit disconnect' in state 'connecting'.
    at StateMachine.handleUnhandledEvent (~\node_modules\finity\lib\core\StateMachine.js:76:13)
    at ~\node_modules\finity\lib\core\HierarchicalStateMachine.js:79:33
    at TaskScheduler.execute (~\node_modules\finity\lib\core\TaskScheduler.js:29:7)
    at TaskScheduler.enqueue (~\node_modules\finity\lib\core\TaskScheduler.js:19:12)
    at HierarchicalStateMachine.handle (~\node_modules\finity\lib\core\HierarchicalStateMachine.js:72:24)
    at RTMClient.disconnect (~\node_modules\@slack\client\dist\RTMClient.js:284:27)
    at Promise (~\src\standup\meeting\response.js:27:31)
    at new Promise (<anonymous>)
    at Response.awaitResponseAsync (~\--\response.js:20:22)
    at Question.askAsync (~\src\--\question.js:32:49)

@dperez3
Copy link

dperez3 commented Jun 4, 2018

One issue I have found is that I am not waiting for RTMClient to be connected before listening. Another issue is... since I am spawning off multiple instances, instances start taking almost a minute to connect. A disconnect during this time would yield an error like the above. However, a disconnect would only happen if a message was received or an error occurred. If a message is being received before it actually connected that would lead me to believe that multiple instances get confused over some shared resource.

Anyway, I'm going to clean up some things like refactor for as little instances as possible since they can take so long to connect.

@aoberoi
Copy link
Contributor

aoberoi commented Jun 4, 2018

@dperez3 okay great, thanks for clarifying! now that i understand a little more about what you're trying to accomplish, i have some further suggestions.

Here I'm trying to receive a response to the message from a user on a channel. Or is that what the code you provided would actually do? If so, that'd be great! If not, is there a way to just listen for messages on a specific channel? (I'm gonna try it anyway though).

No, that's not how the code I provided works; the promise resolves when the message being sent is acknowledged by Slack. Your problem is different.

How did you send the message you are awaiting on for a response? If you sent the first message with an RTMClient, you shouldn't need to instantiate a whole new RTMClient for listening for the user's response message. You can just listen on that same instance used to send the message. However, if instead you sent the message with a WebClient or IncomingWebhook, then there's two more options I suggest: 1) use the Events API to receive incoming messages (see @slack/events-api module) or 2) instantiate the RTMClient object and connect before you send the message (then there's no chance that you'd miss the message). In all of these cases, you want match messages generally (using .on() instead of .once()) and use a condition to check if its a message you need (e.g. check if its from the right channel).


i don't understand the experiment above. if you start an RTM connection and then disconnect before it finishes, then its a useless RTMClient (but I do think we should do better and not throw an error - i'll create a separate issue for that). why would you want many connections with the same token? if its because you are waiting for separate responses concurrently, then i think it would be better to store information about the separate responses you are waiting for, and then use the message handler to check against that set. without knowing too much more about your app, here's an example that might be useful:

const rtm = new RTMClient(BOT_TOKEN);
rtm.start(null);

// This object maps channel ID keys to values that are shaped as { promise, resolveFn, rejectFn }
const channelsAwaitingUserResponse = new Map();

rtm.on('message', (event) => {
  if (channelsAwaitingUserResponse.has(event.channel)) {
    const val = channelsAwaitingUserResponse.get(event.channel);
    val.resolveFn(event);
  }
});

async function sendMessageAndAwaitResponse(message, channelId) {
  await rtm.sendMessage(message, channelId);
  const userResponded = new Promise((resolve, reject) => {
    channelsAwaitingUserResponse[channelId] = {
      promise: userResponded,
      resolveFn: resolve,
      rejectFn: reject,
    };
  });
  return await userResponded;
};


// Now you can send a message and await for the user's response
const userResponse = await sendMessageAndAwaitResponse('hello world', 'C12345');

You'll notice that in this example, the RTMClient is only instantiated once, and stays connected for as long as the application is running. It also only has one message listener, which essentially "routes" the incoming message with an appropriate action. In more complex apps you might have many independent message handlers.

Note that the example above is limited to allowing only one "unresponded" message per channel. I'm not sure if this is a realistic constraint for your app, but it seems like you were already making that assumption in your code examples, so it carried over into my example. You could remove that constraint by creating a different condition to identify whether a specific message is a "response" to the message you send, besides just checking the channel ID.

@bmajz
Copy link
Author

bmajz commented Jun 5, 2018

Just to return to the original topic at hand. I just got the following KeepAlive errors on 4.2.2

Jun 04 22:14:01 d3ace470e8cc ecs-rtmworker-19-rtmworker-f482e18cdabd8ed69f01: 05:14:01 1|rtm      | [ERROR] @slack/client:RTMClient failed to send message on websocket: WebSocket is not open: readyState 2 (CLOSING)
Jun 04 22:14:01 d3ace470e8cc ecs-rtmworker-19-rtmworker-f482e18cdabd8ed69f01: 05:14:01 1|rtm      | [ERROR] @slack/client:KeepAlive Unhandled error: Failed to send message on websocket: WebSocket is not open: readyState 2 (CLOSING). Please report to @slack/client package maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug M-T: A confirmed bug report. Issues are confirmed when the reproduction steps are documented
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants