Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd: Add health check commands #319

Merged
merged 7 commits into from
Dec 18, 2019
Merged

cmd: Add health check commands #319

merged 7 commits into from
Dec 18, 2019

Conversation

tleef
Copy link
Contributor

@tleef tleef commented Dec 18, 2019

Related issue

Proposed changes

Add health check commands to the oathkeeper CLI.

motivation:
Support for Docker HEALTHCHECK

Docker's HEALTHCHECK instruction executes a command e.g. CMD curl -f http://localhost/ || exit 1 in the container to determine the health status of the container. The exit code of the command determines the health status. The possible exit codes are:

  • 0: success - the container is healthy and ready for use
  • 1: unhealthy - the container is not working correctly
  • 2: reserved - do not use this exit code

Rather than using curl (which is not available in the official oathkeeper image) to query the oathkeeper api, we can query the api using oathkeeper's CLI. This has a few advantages:

  • No additional dependencies e.g. curl
  • The health check is portable to other environments
  • The health check can be more rigorous than would otherwise be possible with a simple http tool

See this blog post for more details.

example usage:
In keeping with the conventions already in place by the rules command, the new health command has the following usage

  • oathkeeper health -e <endpoint> alive for checking liveness
  • oathkeeper health -e <endpoint> ready for checking readiness
$ oathkeeper health -e http://localhost:4456 ready
{
    "status": "ok"
}

Checklist

  • I have read the contributing guidelines
  • I have read the security policy
  • I confirm that this pull request does not address a security
    vulnerability. If this pull request addresses a security vulnerability, I
    confirm that I got green light (please contact
    [email protected]) from the maintainers to push
    the changes.
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation within the code base (if appropriate)
  • I have documented my changes in the
    developer guide (if appropriate)

Further comments

@claassistantio
Copy link

claassistantio commented Dec 18, 2019

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@aeneasr aeneasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Just some documentation things. Also, I think these command should exit with a status code of 1 if the check fails!

cmd/health.go Outdated Show resolved Hide resolved
cmd/health_alive.go Outdated Show resolved Hide resolved
var healthAliveCmd = &cobra.Command{
Use: "alive",
Short: "Command for checking alive status",
Run: func(cmd *cobra.Command, args []string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a long description here, with at least one example, showcasing how to use this command.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you should add a warning:

If the target endpoint runs behind a Load Balancer, this command will effectively test the health of the Load Balancer, not the actual ORY Oathkeeper deployment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the Load Balancer warning to the parent health command since that is where the endpoint flag is declared.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most devs do not read the "top level" command docs, only the specific docs. It makes sense to have the note in all three!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


r, err := client.API.IsInstanceAlive(api.NewIsInstanceAliveParams())
cmdx.Must(err, "%s", err)
fmt.Println(cmdx.FormatResponse(r.Payload))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this command should exit with an error code of 1 if the instance is not yet alive. Any response from the server that has a status code of != 200 means that this instance is not alive yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added explicit exit codes in all cases now. cmdx.Must(err, "%s", err) will exit 1 if there was an error e.g. a 404 response. If no error, it will exit 1 if r.Payload.Status != "ok". Otherwise, exit 0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I forgot that the API will return an error if the code is not 200. We should still print out the received status code, which will make it easier to debug :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error status code is printed in the error message. For example:

$ oathkeeper health -e http://localhost:4456/bad ready
> unknown error (status 404): {resp:0xc0028aee10}

I can't find any way to get a reference to the StatusCode based on the what the internal API client returns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can type assert the error with runtime.APIError and then you have access to the Response, code, and so on, so e.g.

if apiErr, ok := err.(*runtime.APIError); ok {
// ...
} else {
// ...
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however, I think it's ok as it is at the moment!

@@ -0,0 +1,26 @@
package cmd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please apply the comments from health_alive here as well!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@aeneasr aeneasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes, a few last things and then we're good to go! :)

var healthAliveCmd = &cobra.Command{
Use: "alive",
Short: "Command for checking alive status",
Run: func(cmd *cobra.Command, args []string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most devs do not read the "top level" command docs, only the specific docs. It makes sense to have the note in all three!

if r.Payload.Status != "ok" {
os.Exit(1)
}
os.Exit(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for os.Exit(0), this is always the case!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the explicit exit 0 for both

fmt.Println(cmdx.FormatResponse(r.Payload))
// When healthy, ORY Oathkeeper always returns a status of "ok"
if r.Payload.Status != "ok" {
os.Exit(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be an error message here. Something like:

fmt.Printf("The endpoint does not appear to be healthy, received HTTP Status Code %d and status %s\n", r.StatusCode, r.Payload.Status)

(I think the status code is r.StatusCode).

Copy link
Contributor Author

@tleef tleef Dec 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTTP StatusCode doesn't appear to be available. The internal API client returns the following struct

type IsInstanceReadyOK struct {
	Payload *models.SwaggerHealthStatus
}

If the HTTP status code isn't a 200, it will be printed as an error and the process will exit 1 which is handled by cmdx.Must(err, "%s", err). Otherwise, the Payload is always printed so you can see the payload status

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right!

@aeneasr aeneasr merged commit 0dd3fe3 into ory:master Dec 18, 2019
@aeneasr
Copy link
Member

aeneasr commented Dec 18, 2019

Thank you!

@tleef
Copy link
Contributor Author

tleef commented Dec 18, 2019

Thanks for all your help reviewing! Does this mean I get to chat in the #contibutors channel now? ;)

@aeneasr
Copy link
Member

aeneasr commented Dec 18, 2019

Credit where credit is due ;) You're now able to write in the contributors channel!

@tleef tleef deleted the cmd_healthcheck branch December 18, 2019 14:45
pike1212 pushed a commit to pike1212/oathkeeper that referenced this pull request Dec 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants