-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stop responding in some condition #744
Comments
Maybe it is just because of cluster mode in node 0.10??? |
In fork mode does this happens ? |
I think I'm having the same with PM2 but I'm running everything as the same non-root user. Here's my process.json
Note: I don't expect to have any open connections with this app so I have this within my app:
So, there are two problems:
The only way to fix the issue is
I get this same behavior if I try
|
I had the same problem. I donwgraded pm2 to version 0.10.4 and it works. My problem happens when I start a second process, when I do this, the first process does not work. I called the debug mode and nothing different happens. |
node 0.10.32 here with fork version has the same issue |
I don't have any resolution yet. Looks like cluster_mode reload (and gracefulReload) are non-functional. I've only been able to |
Please give a try of this branch:
It has fixes about the reload function. Tell us if it fixes these problems. |
checked site to make sure it's running (working fine)
now any requests to the API are hanging/timeout. only resolution is node -v pm2 -v |
I just realized something... in order to gracefulReload, do I need to configure the instances to (number of CPU) -1 in order to allow spin up of a new instance or will PM2 allow more instances than CPU for the purpose of gracefulReload? |
OK, I fixed my custer_mode -> cluster_mode config snafu, but now, even though reload succeeds, I still get the app hang. |
Can you use the 0.11.14?
|
I think I discovered the problem... I tailed my child-out log and noticed this:
in my hapi.js app, I had this:
but it appears the shutdown message is being handled after the graceful reload--so that the hapi server is stopping the new gracefully reloaded instance. I removed my process.on event listener and now gracefulReload works :)
Note: this does not fix it for or for |
I tried to replicate the issue with this sample code: var Hapi = require('hapi');
var server = new Hapi.Server('localhost', 8000);
server.route({
method: 'GET',
path: '/',
handler: function (request, reply) {
reply('hello world');
}
});
server.start();
process.on('message', function(msg) {
if (msg == 'shutdown') {
console.log('Hapi Server received shutdown event, waiting for close');
var timer = setTimeout(function () {
console.log('Hapi Server killed anyway after timeout');
process.exit(1);
}, 1000);
server.stop({}, function () {
clearTimeout(timer);
console.log('Hapi Server successfully stopped');
process.exit(0);
});
}
}); But it didn't happen. I tried with Node 0.10.30/0.11.13/0.11.14. Can you please try one last time with this pm2 version : |
Now it's hanging.... I'm running this repo: with:
then
then when I try gracefulReload, pm2 just hangs:
After aborting the gracefulReload attempt:
"starting pm2 daemon" sounds like it wasn't running, but that's all the output that came from the command. Run it again, and now I get the version:
if I try to run
|
What does it print in ~/.pm2/pm2.log ? |
OK, odd. I just switched back to pm2 stable (0.10.8) with node 0.10.32 and now gracefulReload is working with that app. In another app, I'm tailing the logs and I see this with gracefulReload (multiple instances):
and then what follows is a constant neverending stream of this:
it appears it's caught in a reload loop. |
I just switched my hapi_playground repo to use max instances and I'm not seeing a problem with gracefulReload. Currently investigating delta between that app and the one I'm having problems with. I'll post and update when I have a repro on hapi_playground. |
Ok thanks, we're investigating this issue as well. |
argh. Heisenbug. Once it even gave me that infinite loop of online + exited logs when I just did a |
So, it appears to be working fine on OSX with port 46100, but not on CentOS 6.5, running on port 80. Note, I've enabled port 80 for node via:
I'm tailing child-out.log, api-error.log and ~/.pm2/pm2.log and this is the flow:
The logs aren't reporting anything odd looking but the app isn't listening on port 80.
hangs here for about 45 seconds, then:
Note though that the following is missing from future reload attempts:
|
It's not port 80 that's the problem. Switched my remote CentOS node to use :46100 and same problem. |
OK. I think I found the problem... I tried deploying my hapi_playground app to my CentOS node and changed the log file paths to /tmp/... (as they are in my other app)
If I launch it with the log files in /tmp, and edit a file, I get the broken app, not listening on the port.
And now it works! By the way, I'm not sure what "[/\]./" is matching. I just saw that as an example in the doc. What does that translate too? |
dang. I just realized I had copied a process.json config lacking "cluster_mode" again. But at least it works in "custer_mode" :( still an issue.... funny thing is that even with the misspelling, it shows up in |
In trying to repro it again today, watch:true + edit index.js worked in safely reloading it once, but then follow up file changes cuase this:
process just vanished.... This is with current stable pm2 version. I'm going to try development version again (though I was having trouble with that one last night too). |
Well, with pm2 -v 0.11.0, it doesn't even work once:
now running, and curl is good:
edit the app:
~/.pm2/pm2.log seems healthy:
but the app is dead. |
I got this to work briefly with these versions and a minimal process.json file:
But then I brought my process.json file back to full and pm2 0.11.0 started going nuts:
and there it hangs :( ~/.pm2/pm2.log
This is my full process.json
if I kill the pm2 daemon proc and then manually Going back to stable pm2... |
Thanks for all the feedback man, we will do our best to resolve this :) |
thanks @jskurti, for me, pm2 is the best deployment process manager, because I use pm2 in production. I hope all of contributor fix this issue.. thanks all |
We just published pm2 v0.11.0 which should resolve this issue.
|
I'm working from this project: https://github.com/atomantic/hapi_playground#testing-pm2-issue-744 on CentOS 6.5. The first time I tried it, it worked. Then I did Here are my full repro steps:
So far, all good... But now, let's wipe it and try again...
This time the curl fails:
If it doesn't, do |
I just pushed an update to hapi_playground with it running on :46100 instead of 80 to make it a simpler repro. I still get the same behavior on 46100. |
We managed to reproduce the bug and confirm that there is a regression between the 0.10.8 and the 0.11.0. It looks like it's a port that is not closed on stop in cluster mode. We are going to develop a test suit to verify connectivity to avoid this kind of regression. Thanks one more time for your feedback, we come back to you once it's fixed |
OK so after deep investigations with PM2 0.10.8 / 0.11.0 and Node 0.10.32 / 0.11.14 on CentOS 7 / Ubuntu, making curl requests, doing lsof to see how connections behaves we conclude that this doesn't come from PM2 but from Node.js 0.10.x. With Node.js 0.11.14 everything run fine. As stated in the Node.js documentation, the cluster module is marked as experimental in the 0.10.x and unstable in the 0.11.x (but it works very well). I made this test file from your bash commands: https://gist.github.com/Unitech/32ebb78f4ce079bcab97 Here are some results of our analysis: https://gist.github.com/Unitech/35c5e7a21e363099f3c6
But with Node 0.11.14 everything is fine:
So yes this is fixed ! Use 0.11.14 :) or use the fork mode if you are stick to the 0.10.32 PS: you should ignore the node_modules folder while watching, it's taking too much CPU usage Have a good week-end! |
Thanks for the followup. So, in fork mode, I can't do zero downtime deployments, right? How will watch behave in fork_mode? will that still bring a new proc up before killing the rest? And thanks for the tip on ignoreWatch node_modules. Can you translate this ignoreWatch from the docs for me: "[/\]./" -- is that ignoring all subfolders of the current directory? Thanks! |
It's skipping dotfiles only. Watch should behave the same way with or without the cluster mode. With |
Is node v0.11 stable enough to be used in production? And why pm2 0.8.x was fine to work with node v0.10? I found that before pm2 10.0.0, everything is fine, and pm2 0.10.x the reload was broken. and in 0.10.x, the App Name was wrong ┌───────────────────────────────────┬────┬─────────┬───────┬────────┬───────────┬────────┬─────────────┬─────────────┐
│ App name │ id │ mode │ PID │ status │ restarted │ uptime │ memory │ watching │
├───────────────────────────────────┼────┼─────────┼───────┼────────┼───────────┼────────┼─────────────┼─────────────┤
│ function bold() { [native code] } │ 0 │ cluster │ 14444 │ online │ 0 │ 2m │ 31.449 MB │ unactivated │
│ function bold() { [native code] } │ 1 │ cluster │ 14446 │ online │ 0 │ 2m │ 31.453 MB │ unactivated │
│ function bold() { [native code] } │ 2 │ cluster │ 14452 │ online │ 0 │ 2m │ 31.473 MB │ unactivated │
│ function bold() { [native code] } │ 3 │ cluster │ 14458 │ online │ 0 │ 2m │ 31.449 MB │ unactivated │
└───────────────────────────────────┴────┴─────────┴───────┴────────┴───────────┴────────┴─────────────┴─────────────┘ |
I use pm2 to manage my node server, pm2 is running as root but node server as nobody. I found that sometimes it will stop responding without a reason, but pm2 list tells me online and every thing is fine.Even if I delete or re start, the request will be pending all the time(all request besides the statics)
The only way to solve this problem is killing the pm2 and restart it.
It happens not usual but sometimes, and I am trying to replicate the problem.Does anyone have the same situation and can give me some suggestion?
pm2 v0.10.7
node v0.10.31
The text was updated successfully, but these errors were encountered: