Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server crashes after 8k concurrent users, even though there's already 13.45s timeout #88

Closed
19h opened this issue Apr 12, 2013 · 21 comments

Comments

@19h
Copy link

19h commented Apr 12, 2013

The SPDY installation crashes after having 8k (7565) concurrent users accessing the server.

See http://data.sly.mn/OEtT.

I don't know what to do in this situation as having the system crash without only experiencing large timeouts opens an attack vector for DDoS..

To check yourself, https://sly.mn/ — a just checkedout (1 hour ago) SPDY installation.

Thanks for any help!

Edit: The server doesn't crash, it just doesn't do anything anymore. It's now a zombie process on the server.

@19h
Copy link
Author

19h commented Apr 12, 2013

Note: That's a test server. We planned the deployment of SPDY on a multi-thousand concurrent user platform, but I'd like to see it working here first :) (Specs of the server: 3.2Ghz 16-Core / 1 Gbit Backbone / 64GB RAM)

@indutny
Copy link
Collaborator

indutny commented Apr 14, 2013

That's pretty odd, I'll soon look into it.

@indutny
Copy link
Collaborator

indutny commented Apr 16, 2013

Just for my information what spdy/node.js versions are you using?

@indutny
Copy link
Collaborator

indutny commented Apr 16, 2013

Also, I've just run some experiment with spdycat and example server and it has easily survived > 10000 connections and requests. Are you sure you're not running out of ephemeral ports or anything like this (i.e. your server configuration is correct)?

@19h
Copy link
Author

19h commented Apr 16, 2013

Linux * 2.6.32-279.1.1.el6.x86_64 SMP Tue Jul 10 13:47:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux.
CentOS 5.5 with latest kernel-patch.
Node 0.10.4
Latest SPDY checkout off this repo.

I give you the permission to flood sly.mn as you want in order to reproduce the issue. However, I am downright unsure about the comparison of local / remote flooding.

@indutny
Copy link
Collaborator

indutny commented Apr 16, 2013

And another question, if you'll replace spdy with https module - will the same test work for you?

@indutny
Copy link
Collaborator

indutny commented Apr 16, 2013

And what's node 10.8.4? :) Do you mean 0.8.4 or 0.10.4?

@19h
Copy link
Author

19h commented Apr 16, 2013

@indutny Sorry, I meant 0.10.4 .. just updated to OSX 10.8.4 so I had that in mind while typing ;) Let me test!

@indutny
Copy link
Collaborator

indutny commented Apr 16, 2013

Ok, let me know if you'll get anything.

@19h
Copy link
Author

19h commented Apr 16, 2013

Wow, now it died nearly instantly with spdy. http://data.sly.mn/OL2F — Let me check with https.

@19h
Copy link
Author

19h commented Apr 16, 2013

Update, with https: http://data.sly.mn/OL4m. Seems a bit more stable.

@19h
Copy link
Author

19h commented Apr 16, 2013

You said you tested spdycat with 10000 connections .. are these concurrent connections?

@indutny
Copy link
Collaborator

indutny commented Apr 16, 2013

Nope, it was more like 2000 concurrent connections each doing 8 requests. Anyway, it seems to be failing with https too, so I believe you're either leaking file descriptors or doing anything else wrong. You can verify fd-leak using lsof utility.

@19h
Copy link
Author

19h commented Apr 16, 2013

Okay. Could you try having more than 5k concurrent users? Uhm, this might be interesting: http://storage.sly.mn/data/dump.txt

@19h
Copy link
Author

19h commented Apr 16, 2013

Okay, let's retry. I have increased the fds to 65535.

@19h
Copy link
Author

19h commented Apr 16, 2013

How it looks from top: http://data.sly.mn/OLBO
Here's what happens: http://data.sly.mn/OKww

Node keeps at 95% CPU after the test has successfully killed the server—

@indutny
Copy link
Collaborator

indutny commented Apr 16, 2013

Can you please try running sudo struss -p <pid> (where <pid> is a
pid of your node.js process) and post it's output here?

On Tue, Apr 16, 2013 at 05:45:58AM -0700, Kenan Sulayman wrote:

How it looks from top: http://data.sly.mn/OLBO
Here's what happens: http://data.sly.mn/OKww

Node keeps at 95% CPU after the test has successfully killed the server—


Reply to this email directly or view it on GitHub:
#88 (comment)

@19h
Copy link
Author

19h commented Apr 16, 2013

Command not found. Where'd I get that from? (already searched the repos, nothing found)

@indutny
Copy link
Collaborator

indutny commented Apr 17, 2013

Try strace instead.

@19h
Copy link
Author

19h commented Apr 17, 2013

Wait.. http://data.sly.mn/OKqf this magically works now..?! I'll have a look into it later this day if I can reproduce the problem..

@19h
Copy link
Author

19h commented Apr 18, 2013

Update: Stumbled over the fact, I'm running with HTTPS all the time right now. So it's SPDY failing :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants