Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health check question #2197

Closed
mhaamann opened this issue May 30, 2016 · 17 comments
Closed

Health check question #2197

mhaamann opened this issue May 30, 2016 · 17 comments

Comments

@mhaamann
Copy link

Hi,

Is it possible to restart a process managed by pm2 if it fails a health check?

Can I connect directly to a process managed to pm2? At the moment i can only access the process through pm2 which means it is random which process I will actually hit. Mainly for debugging purposes.

The reason for asking is that we today found two processes (out of 5) that was failing the health check the haproxy was using. This lead to the server with pm2 running was being taken down every 5 minutes.

Thanks,

@nfantone
Copy link

@mhaamann Throw an exception?

@mhaamann
Copy link
Author

Hmm, makes sense.
Will it then do a gracefull restart so other requests have time to finish, or will it make a hard restart?

@nfantone
Copy link

I'd say that is really up to you. An uncaught exception will terminate your app, so when pm2 gets a chance of restarting it, it has already been killed.

If you're using express (or similar), I'd suggest including a "graceful shutdown" middleware in your server setup that'll wait for sockets to be cleaned up before terminating. There are many implementations out there.

@vincentheet
Copy link

I think there is a use case for having a real health check instead of throwing an exception. For example we have an application which has an http endpoint on /health which returns a 200 status code if everything is ok (db connection works, nosql store is reachable, etc). Throwing an exception is in my opinion not a acceptable solutions if that results in a application kill by PM2. Then we can also just call process.exit() if something goes wrong? Who says the application will not resurrect due to for example database connection pooling?

@JamesBewley
Copy link

JamesBewley commented Sep 12, 2016

+1 for health checking processes and better granularity of control at a process level.

I have a pool of servers behind a load balancer which uses a health check (/healthcheck.js returns 200 OK) to take nodes out of the pool of active servers. Each server is running PM2 which clusters node instances to scale to available CPU resource.

I am finding that it is possible for one node process to lockup (no logs or errors reported) but the server stays in the pool on load balancer because other processes are working. When this happens the service is effectively taken offline with no automatic recovery.

@vmarchaud
Copy link
Contributor

I'm not sure i'm understanding what you want to implement because i think you already have the possibility do that, imagine an app like this :

var http = require('http');

http.createServer(function(req, res) {
  res.end('Done');
}).listen(3000);

function healthcheck(cb) {
 // do your verification
 return cb(null, true);
}
process.on('heathcheck', function(packet) {
  healthcheck(function (err, data) {
    var state = err ? false : true;
    process.send({
      type : 'process:msg:healthcheck',
      data : {
       err: err
       data: data
       state : state
      }
    });
  });
});

And a worker (i advise you to do that with a pm2 module btw) :

var pmx     = require('pmx');
var pm2     = require('pm2');


pm2.connect(function() {
  setInterval(function () {
   // every 10 seconds, list process handled by pm2
    pm2.list(function (err, list) {
      list.forEach(function (process) {
        // and for each send a healthcheck request
        pm2.sendDataToProcessId({
          type : 'healthcheck',
          data : {},
          id   : proc1.pm2_env.pm_id
        }, function(err, res) {
          // response will be actually called using the EventBus of pm2
          // but err can be filled with eventual error while communicating with pm2 daemon
        });
      }
    }
  }, 10000);

  pm2.launchBus(function(err, bus) {
    // listen for healthcheck response here
    pm2_bus.on('process:msg:healtcheck', function(packet) {
      // analyse your data here and do what you want like restart or whatever
      console(packet);
    });
  });
});

I didnt tested this code but it should work as it rely on the pm2 api, specially this part .
If you want this to be implemented inside pm2 as a feature, i dont think it will be the case soon since its already possible using the API.

@JamesBewley
Copy link

JamesBewley commented Sep 12, 2016

And what can be done if the process fails the health check or is frozen and does not return a response?
Can a single worker process be recycled using the API?

@vmarchaud
Copy link
Contributor

As you can read in the API here, you can restart/stop process etc

@JamesBewley
Copy link

Ok, thank for the information. I can see the module and API being very useful.

In terms of health checking frozen processes I would suggest this should probably an official module if it isn't already. I imagine lots of people would use this.

@soyuka
Copy link
Collaborator

soyuka commented Sep 12, 2016

@JamesBewley not sure about this because it depends a lot of your particular needs.

@JamesBewley
Copy link

JamesBewley commented Sep 13, 2016

@soyuka There will be a whole load of common logic specific to pm2 that has nothing to do with the application. Things like how to detect and handle of timeout of the event across the axon bus.

There might be some configuration such as how long to wait before turning over the process but the rest will likely be common.

@JamesBewley
Copy link

JamesBewley commented Sep 13, 2016

I've started looking at this and put together a PM2 module from the information here.
I can't see how to recover from a frozen process since the axon 'error' event gives no context about the which PM2 thread failed.

I think this needs to be provided by PM2

https://github.com/Telemisis/pm2-health-check

@vmarchaud
Copy link
Contributor

PM2 isnt supposed to crash, what will happen if an app doesnt respond is that no event will be emitted to the pm2 bus, and as i said, its not a priority since its possible to do this using the API.

@epozsh
Copy link

epozsh commented Aug 1, 2017

@vmarchaud Hello, is there any health check option for pm2? or i have to use middleware in my app?

@soyuka
Copy link
Collaborator

soyuka commented Aug 1, 2017

I use things like this in plain bash with the help of jq:

Status by name:

pm2 jlist | jq '.[] | select(. | contains({name: "yourname"})).pm2_env.status'

Every names:

pm2 jlist | jq '.[].name'

memory usage of process #1:

pm2 jlist | jq '.[] | select(. | contains({pm_id: 1})).monit'

@epozsh
Copy link

epozsh commented Aug 1, 2017

Yes, but this i no automatic, right?

@soyuka
Copy link
Collaborator

soyuka commented Aug 1, 2017

Nope, you may consider using Keymetrics which is a great monitoring software which will work seamlessly with PM2! If you already have nagios or similar there should be some plugins helping you to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants