Getting started with Prometheus-based monitoring of KFServing models.
- Install Prometheus
- Access Prometheus Metrics
- Metrics-driven experiments and progressive delivery
- Removal
Prerequisites: Kubernetes cluster and Kustomize v3.
Install Prometheus using Prometheus Operator.
cd kfserving
kubectl apply -k docs/samples/metrics-and-monitoring/prometheus-operator
kubectl wait --for condition=established --timeout=120s crd/prometheuses.monitoring.coreos.com
kubectl wait --for condition=established --timeout=120s crd/servicemonitors.monitoring.coreos.com
kubectl apply -k docs/samples/metrics-and-monitoring/prometheus
Note: The above steps install Kubernetes resource objects in the
kfserving-monitoring
namespace. This is Kustomizable. To install under a different namespace, saymy-monitoring
, changekfserving-monitoring
tomy-monitoring
in the following three files: a)prometheus-operator/namespace.yaml
, b)prometheus-operator/kustomization.yaml
, and c)prometheus/kustomization.yaml
.
-
This article provides details of how Prometheus setup using the above operator actually scrapes metrics.
-
Basically, the config
serviceMonitorNamespaceSelector: {}
in the Prometheus CRD means all namespaces will be watched for service monitor objects. You can confirm that by running:
$ kubectl get prometheus -n <name-of-the-namespace> -o yaml
...
serviceMonitorNamespaceSelector: {}
...
- The
serviceMonitorSelector:
field in the above Prometheus object indicates the labels that should be put onServiceMonitor
objects. For e.g. one prometheus config looked as follows:
$ kubectl get prometheus -n <name-of-the-namespace> -o yaml
...
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector:
matchLabels:
release: kube-prometheus-stack-1651295153
This meant that every ServiceMonitor
object that prometheus is expected to scrape should have the label release: kube-prometheus-stack-1651295153
.
- The ServiceMonitor objects created in the
knative-serving
,knative-eventing
or even in the application namespaces should have the above label. E.g.
$ kubectl get servicemonitor activator -o yaml -n knative-serving
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
generation: 3
labels:
release: kube-prometheus-stack-1651295153 <<<<-------- Same as serviceMonitorSelector
name: activator
namespace: knative-serving
spec:
endpoints:
- interval: 30s
port: http-metrics
namespaceSelector:
matchNames:
- knative-serving
selector:
matchLabels:
serving.knative.dev/release: v0.22.1 <<<-- This label should be on K8s service objects.
- Now, the
ServiceMonitor
objects indicate which KubernetesService
objects they select. - The
selector.matchLabels
field in the aboveServiceMonitor
object does that. All the Kubernetes service objects created by kserve had the the labelserving.knative.dev/release=v0.22.1
. Hence the above example used that field in thematchLabels
. But, it could be replaced with any other labels.
$ kubectl get svc -n knative-serving --show-labels
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE LABELS
activator-service ClusterIP 10.124.9.157 <none> 9090/TCP,8008/TCP,80/TCP,81/TCP 25d app=activator,serving.knative.dev/release=v0.22.1
autoscaler ClusterIP 10.124.8.155 <none> 9090/TCP,8008/TCP,8080/TCP 25d app=autoscaler,serving.knative.dev/release=v0.22.1
autoscaler-bucket-00-of-01 ClusterIP 10.124.14.250 <none> 8080/TCP 25d <none>
controller ClusterIP 10.124.4.127 <none> 9090/TCP,8008/TCP 25d app=controller,serving.knative.dev/release=v0.22.1
istio-webhook ClusterIP 10.124.0.96 <none> 9090/TCP,8008/TCP,443/TCP 25d networking.knative.dev/ingress-provider=istio,role=istio-webhook,serving.knative.dev/release=v0.22.1
webhook ClusterIP 10.124.15.22 <none> 9090/TCP,8008/TCP,443/TCP 25d role=webhook,serving.knative.dev/release=v0.22.1
In this section, we will use a v1beta1 InferenceService sample to demonstrate how to access Prometheus metrics that are automatically generated by Knative's queue-proxy container for your KFServing models.
kubectl create ns kfserving-test
cd docs/samples/v1beta1/sklearn
kubectl apply -f sklearn.yaml -n kfserving-test
- If you are using a Minikube based cluster, then in a separate terminal, run
minikube tunnel --cleanup
and supply password if prompted. - In a separate terminal, follow these instructions to find and set your ingress IP, host, and service hostname. Then, send prediction requests to the
sklearn-iris
model you created in Step 3. above as follows.
while clear; do \
curl -v \
-H "Host: ${SERVICE_HOSTNAME}" \
-d @./iris-input.json \
http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/sklearn-iris/infer
sleep 0.3
done
- In a separate terminal, port forward the Prometheus service.
kubectl port-forward service/prometheus-operated -n kfserving-monitoring 9090:9090
- Access Prometheus UI in your browser at http://localhost:9090
- Access the number of prediction requests to the sklearn model, over the last 60 seconds. You can use the following query in the Prometheus UI:
sum(increase(revision_app_request_latencies_count{service_name=~"sklearn-iris-predictor-default"}[60s]))
You should see a response similar to the following.
- Access the mean latency for serving prediction requests for the same model as above, over the last 60 seconds. You can use the following query in the Prometheus UI:
sum(increase(revision_app_request_latencies_sum{service_name=~"sklearn-iris-predictor-default"}[60s]))/sum(increase(revision_app_request_latencies_count{service_name=~"sklearn-iris-predictor-default"}[60s]))
You should see a response similar to the following.
See Iter8 extensions for kfserving.
Remove Prometheus and Prometheus Operator as follows.
cd kfserving
kubectl delete -k docs/samples/metrics-and-monitoring/prometheus
kubectl delete -k docs/samples/metrics-and-monitoring/prometheus-operator