-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 357: Integrating health checks #569
Conversation
Codecov Report
@@ Coverage Diff @@
## master #569 +/- ##
==========================================
+ Coverage 74.56% 74.68% +0.12%
==========================================
Files 15 15
Lines 4144 4164 +20
==========================================
+ Hits 3090 3110 +20
Misses 932 932
Partials 122 122
Continue to review full report at Codecov.
|
Signed-off-by: SrishT <[email protected]>
Signed-off-by: SrishT <[email protected]>
Signed-off-by: SrishT <[email protected]>
Signed-off-by: SrishT <[email protected]>
Signed-off-by: SrishT <[email protected]>
Signed-off-by: SrishT <[email protected]>
0cd62bb
to
747ddec
Compare
pkg/util/pravegacluster.go
Outdated
if IsVersionBelow(version, compareVersion) { | ||
command = fmt.Sprintf("netstat -ltn 2> /dev/null | grep %d || ss -ltn 2> /dev/null | grep %d", port, port) | ||
} else { | ||
command = fmt.Sprintf("(netstat -ltn 2> /dev/null | grep %d || ss -ltn 2> /dev/null | grep %d) && (curl -s -X GET 'http://localhost:%d/v1/health/liveness' || curl -s -k -X GET 'https://localhost:%d/v1/health/liveness')", port, port, restport, restport) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it really required to check the port using netstat
? Why not to send a REST request right away and check for return code?
It might happen that netstat
will not be present in pravega containers starting from 0.10...
Signed-off-by: SrishT <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@SrishT Seeing e2e are failing, Please check |
Signed-off-by: SrishT <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: SrishT <[email protected]>
command = fmt.Sprintf("echo $JAVA_OPTS | grep 'controller.auth.tlsEnabled=true' && curl -s -X GET 'https://localhost:%d/v1/scopes/' -H 'accept: application/json' | grep '_system'|| (echo $JAVA_OPTS | grep 'controller.auth.tlsEnabled=false' && curl -s -X GET 'http://localhost:%d/v1/scopes/' -H 'accept: application/json' | grep '_system' ) || (echo $JAVA_OPTS | grep 'controller.security.tls.enable=true' && echo $JAVA_OPTS | grep -v 'controller.auth.tlsEnabled' && curl -s -X GET 'https://localhost:%d/v1/scopes/' -H 'accept: application/json' | grep '_system' ) || (curl -s -X GET 'http://localhost:%d/v1/scopes/' -H 'accept: application/json' | grep '_system') ", port, port, port, port) | ||
} | ||
} else { | ||
command = fmt.Sprintf("curl -s -X GET 'http://localhost:%d/v1/health/readiness' || curl -s -k -X GET 'https://localhost:%d/v1/health/readiness'", port, port) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there something in the readiness call payload that indicates whether or not pravega controller is running with auth on (I am not suggesting to send bad credentials to test out a 401 in return :) rather, just a field that literally says "auth: enabled" in the health readiness response). If so, then if auth was requested in the java options, we should find in the readiness a confirmation that auth is on. And the opposite if auth was not set or set to off in the java options.
If that's not something that is in the readiness payload today; I think it should be added, but there is nothing for this PR to leverage and line 109 is the best it can be at this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sarlaccpit the /liveness
and /readiness
endpoints are configured to return only a boolean response. There is another /getDetails
endpoint which is exposed, which can be configured to return additional details like the one you mentioned, but this endpoint can only be accessed by authenticated users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I think it's fine the way it is then. It makes sense the auth/no auth (and probably many other details) are protected behind the get details endpoint.
Signed-off-by: SrishT <[email protected]>
Signed-off-by: SrishT <[email protected]>
Signed-off-by: SrishT [email protected]
Change log description
Integrates the apis exposed by the new healthcheck framework which is compatible with pravega version >= 0.10.0, while not breaking backward compatibility.
Purpose of the change
Fixes #357
What the code does
Uses the latest healthcheck apis for monitoring the health of the controller and segmentstore pods when any pravega version starting 0.10.0 is deployed. Also, uses the earlier liveness and readiness probes for the controller and segmentstore pods when pravega version lower than 0.10.0 has been deployed.
How to verify it
When pravega version is lesser than 0.10.0, the following healthchecks are invoked
for SegmentStore
for Controller
However, when deployed pravega version is equal to 0.10.0, the latest healthcheck apis are invoked.
for SegmentStore
for Controller
Following scenarios have been tested (with Auth and TLS disabled, with only Auth enabled, and with both Auth and TLS enabled)