forked from pravega/pravega
-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Kevin Han edited this page Aug 6, 2020
·
4 revisions
This PDP(Pravega Design Proposal) proposes a design of Pravega Health Check. It covers the requirements of the feature, the main considerations and concerns behind the design, the architecture of the sub-system, the typical usages on various levels, and the implementation of some HealthAspects.
- Support Health(Readiness) Check
- Support Liveness Check
- (Optional) Support Wellness Check
- Easily consumable by both machine and human
- Extensible and maintainable
- Standard based approach instead of ad-hoc implementation
- Accessible from all the levels, such as aspects, process (localhost), pod, service (cluster), database and Metrics backend
- Lightweight with minimum overhead; in PULL mode
Access Level | Query Example | Response Example | Use Cases |
---|---|---|---|
Local | curl http://localhost:10080/health | {"health": 0} {"health": -1} |
Fundamental Healthcheck for SegmentStore |
Local | curl http://localhost:10080/live | {"liveness": 0} | Fundamental Liveness Check for SegmentStore |
Local | curl http://localhost:10090/healthDetails | {"health": -1, "details": "No Active SegmentContainer"} |
Fundamental Healthcheck for Controller |
K8S pod | curl -v http://10.100.200.125:10080/health; curl -v http://10.100.200.125:10090/health | Troubleshooting inside K8S | |
Operator | {LivenessProbe: exec: Command: curl -v /live ReadinessProbe: Exec: Command: curl -v /health |
Operator exposes Pravega liveness and readiness check to K8S |
|
Service (CLI) | health -[Segmentstore | Controller | All] |
Database (Influxdb) |
SELECT healthevents from SegmentstoreHealthEvents ... |
History and distribution of health information available now |
|
Metrics (Grafana) |
Metrics backend User Interface | Integration with BK/ZK metrics possible now |
- HealthAspect - An aspect of the system health. E.g. Cache is an aspect of SegmentStore service health. Note there might be multiple instances from the aspect. HeathAspect is supposed to provide the function to do healthcheck and aspect level aggregation
- HealthInfo - Object to storage the final healthcheck result, such as status code and details
- HealthAspectProvider - A system component could be a HealthAspectProvider if it registers HealthAspect upon its initialization and closes the aspect when the compoment closes
- HealthAspectRegistry - A container to hold all references to all the active HealthAspects. Note the references are weak references to prevent memory leaking
- Each system component with health concern implements HealthAspectProvider interface, which registers its own HealthAspect upon component initialization, and closes the aspect when the component closes
- HealthAspect holds the functions to run healthcheck and aspect level aggregation
- HealthRegistry holds weak references to all the active HealthAspects to avoid memory leaking
- When healthcheck is requested, HealthRegistry iterates all the active HealthAspects to get HealthInfo. In addition, it also does the aspect level aggregation to determine the health for the entire aspect
- Aspect level aggregation function is provided by HealthAspect. E.g. for SegmentContainerHealthAspect, if more than half of Segment Containers are not healthy, then the SegmentContainer is considered unhealthy on the aspect level
- The existing REST endpoint inside Controller is used to expose HealthCheck result
- New REST endpoint will be created for SSS
- Pravega Operator will expose those healthcheck endpoints to Kubernetes
- Pravega Command Line tool will do service level healthcheck. CLI will query all the service pods and do the service level aggregation
- Each call of HealthCheck is also a metrics event, so user could view Pravega Healthcheck history and distribution at the backend, such as Grafana
import lombok.Data;
@Data
public class HealthInfo {
enum Status {
HEALTH,
UNHEALTH
}
/**
* The status of the HealthInfo
*/
final Status status;
/**
* The details of the HealthInfo
*/
final String details;
}
public interface HealthAspect {
/**
* Return the unique ID of the aspect instance
* @return the aspect instance id
*/
String getAspectInstanceId();
/**
* Each Health Aspect should provider an Supplier for HealthInfo
*
* @return HealthInfo of the health aspect
*/
Supplier<HealthInfo> getHealthInfoSupplier();
/**
* There might be multiple instances from the same HealthAspect. Aspect Level Aggregator
* should be provided as well to make a conclusion on Aspect Level.
*
* @return Function to make aspect level conclusion based on all the available HealthInfo of the aspect
*/
Function<Iterable<HealthInfo>, HealthInfo.Status> getHealthAspectAggregator();
}
public interface HealthAspectProvider {
/**
* Register the HealthAspect upon its initialization
*/
void registerHealthAspect();
/**
* Close the HealthAspect upon its shutdown
*/
void closeHealthAspect();
}
public interface HealthAspectProvider {
/**
* Register the HealthAspect upon its initialization
*/
void registerHealthAspect();
/**
* Close the HealthAspect upon its shutdown
*/
void closeHealthAspect();
}
public class AggregatorUtil {
/**
* Given a group of HealthInfo determine the overall health situation using majority rule.
*
* @param healthInfos
* @return the overall health status
*/
public static HealthInfo.Status majority(Iterable<HealthInfo> healthInfos) {
int healthCount = 0;
int unhealthCount = 0;
for (HealthInfo info: healthInfos) {
if (info.getStatus() == HealthInfo.Status.HEALTH) {
healthCount++;
} else {
unhealthCount++;
}
}
return unhealthCount > healthCount ? HealthInfo.Status.UNHEALTH : HealthInfo.Status.HEALTH;
}
}
public class SegmentContainerHealthAspect implements HealthAspect {
private String aspectInstanceId;
private StreamSegmentContainer container;
private static Function<Iterable<HealthInfo>, HealthInfo.Status> aspectLevelAggregator = healthInfos -> AggregatorUtil.majority(healthInfos);
private final Supplier<HealthInfo> healthInfoSupplier;
public SegmentContainerHealthAspect(String aspectInstanceId, StreamSegmentContainer container, Supplier<HealthInfo> healthInfoSupplier) {
this.aspectInstanceId = aspectInstanceId;
this.healthInfoSupplier = healthInfoSupplier;
}
@Override
public String getAspectInstanceId() {
return this.aspectInstanceId;
}
@Override
public Supplier<HealthInfo> getHealthInfoSupplier() {
return this.healthInfoSupplier;
}
@Override
public Function<Iterable<HealthInfo>, HealthInfo.Status> getHealthAspectAggregator() {
return aspectLevelAggregator;
}
Supplier<HealthInfo> createHealthInfoSupplier() {
HealthInfo.Status status = container.isClosed() ? HealthInfo.Status.UNHEALTHY : HealthInfo.Status.HEALTHY;
String details = container.getActiveSegments().toString();
return new HealthInfo(status, details);
}
}
public class StreamSegmentContainer implements HealthAspectProvider, AutoCloseable {
final HealthRegistry healthRegistry;
final HealthAspect healthAspect;
public StreamSegmentContainer(int containerId, HealthRegistry healthRegistry) {
this.healthRegistry = healthRegistry;
this.healthAspect = new SegmentContainerHealthAspect(new String(id), () -> new HealthInfo(HealthInfo.Status.HEALTH, "OK"));
registerHealthAspect();
}
@Override
public void registerHealthAspect() {
healthRegistry.registryHealthAspect(this.healthAspect);
}
@Override
public void closeHealthAspect() {
healthRegistry.closeHealthAspect(this.healthAspect);
}
@Override
public void close() {
closeHealthAspect();
}
}