Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node v0.11.15+ should use pauseOnConnect option with net.createServer #10

Open
zhonked opened this issue Apr 22, 2015 · 9 comments
Open

Comments

@zhonked
Copy link

zhonked commented Apr 22, 2015

Thanks for creating this helpful guide, I was running into some performance degradation after following the guide in node v0.12.2, but found out there was a pauseOnConnect option added to net.createServer that turned out to be the main issue. I didn't see this info anywhere after much googling so hopefully this will be of use to someone else.

Also added here: indutny/sticky-session#25

@elad
Copy link
Owner

elad commented Apr 23, 2015

Ironically, pauseOnConnect was introduced to the fix issues reported by @samcday and me! Sorry for slacking off and thanks for the prodding. :)

@elad elad closed this as completed Apr 23, 2015
@zhonked
Copy link
Author

zhonked commented Apr 23, 2015

Ah, I must have missed those issues on my initial search! It turns out from further testing that even with pauseOnConnect I'm seeing requests not making it through to the workers. Connections seem to be make it to workers and emitted, but requests sometimes do not (based on logging in event listeners in the workers. Not sure if you've gotten any further with this?

@elad
Copy link
Owner

elad commented Apr 24, 2015

It's hard to tell what the problem is without looking at code or a reproducible test case. I'd like to know more about this though:

Connections seem to be make it to workers and emitted, but requests sometimes do not

"Connections" are actually sockets resulting from accepting an incoming connection. Sockets are file descriptors, which are the only thing you can pass. Requests however are higher level objects and I doubt you can pass them to workers at all. If you could elaborate, or post a simple test can I could reproduce, I can help you figuring out what's wrong. But otherwise it would be shooting in the dark. :)

@zhonked
Copy link
Author

zhonked commented Apr 24, 2015

Thanks elad, to simplify the issue I took your code (with the new pauseOnConnect flag!) and added a simple hello world end point:

// Here you might use middleware, attach routes, etc.
app.get('/', function (req, res) {
  res.send('Hello World!');
});

I also restricted to just one worker to remove some variables:

num_processes = 1;
...
// We received a connection and need to pass it to the appropriate
// worker. Get the worker for this connection's source IP and pass
// it the connection.
var index = 0;
var worker = workers[index];

With these changes I ran siege to do some basic stress testing and saw a high number of failed transactions:

siege -v -d1 -c100 -r10 http://localhost:3000
Transactions:                914 hits
Availability:              91.40 %
Elapsed time:               9.08 secs
Data transferred:           0.01 MB
Response time:              0.01 secs
Transaction rate:         100.66 trans/sec
Throughput:             0.00 MB/sec
Concurrency:                0.67
Successful transactions:         914
Failed transactions:              86
Longest transaction:            0.04
Shortest transaction:           0.00

If you make a minor tweak to allow direct communication with the worker:

var server = app.listen((4000+cluster.worker.id), 'localhost', function() {
  var host = server.address().address;
  var port = server.address().port;
  console.log('Example app listening at http://%s:%s', host, port);
});

and run siege directly against a worker, you get much better results in terms of failed transactions:

siege -v -d1 -c100 -r10 http://localhost:4001
Transactions:               1000 hits
Availability:             100.00 %
Elapsed time:              10.05 secs
Data transferred:           0.01 MB
Response time:              0.01 secs
Transaction rate:          99.50 trans/sec
Throughput:             0.00 MB/sec
Concurrency:                0.90
Successful transactions:        1000
Failed transactions:               0
Longest transaction:            0.07
Shortest transaction:           0.00

I set up some logging off of the worker event listeners to see what was going on:

var connectCount = 0;
server.on('connection', function() {
  connectCount++;
  console.log('connectCount: ', connectCount);
});
var requestCount = 0;
server.on('request', function() {
  requestCount++;
  console.log('requestCount: ', requestCount);
});

And I see that the final counts when testing against the master show that while all connection events are fired, not all connections events result in request events. This makes me think the sockets/fds are closed before they are able to be read or something else is being clobbered. This is mere speculation on my part at this point:

//run against master
connectCount:  1000
requestCount:  914

//run directly against worker
connectCount:  1000
requestCount: 1000

In the end I think I'll most likely externalize the load balancing/sticky sessions as I may need to do some horizontal scaling in the future as well, but it'd be nice to figure out what's going on here.

Thanks again for all your help!

@elad
Copy link
Owner

elad commented Apr 25, 2015

That sounds like a bug in node.js. Have you tried running tcpdump to see where connections get dropped and how?

@elad
Copy link
Owner

elad commented Apr 27, 2015

Reopening

@elad elad reopened this Apr 27, 2015
@elad
Copy link
Owner

elad commented May 7, 2015

@zhonked do you want to submit an issue and test code to node.js/io.js? I think this is an important issue; if you don't have the time I'll make some and take care of it. :)

@zhonked
Copy link
Author

zhonked commented May 8, 2015

@elad, sorry about being so late on this... I got tied up with a few things, I'll put together something in the next day or two and submit it and link it here in case you have some extra insight to add. Thanks for the nudge :)

@zhonked
Copy link
Author

zhonked commented Jun 29, 2015

Time got a way from me again... but I posted what I could to the node repo:

nodejs/node-v0.x-archive#25594

Hopefully I'll actually find some time to dig into this a bit more in the near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants