In this lab you will explore various aspects of the builtin OpenShift Monitoring. This includes an overview of the OpenShift Alertmanager UI, accessing the Prometheus web console, running PromQL (Prometheuses Query Language) queries to inspect the cluster and finally looking at Grafana dashboards.
OpenShift Container Platform includes a pre-configured, pre-installed, and self-updating monitoring stack that is based on the Prometheus open source project and its wider eco-system. It provides monitoring of cluster components and includes a set of alerts to immediately notify the cluster administrator about any occurring problems and a set of Grafana dashboards. The cluster monitoring stack is only supported for monitoring OpenShift Container Platform clusters.
Login to the console as the admin
user.
oc whoami --show-console
-
On the left hand side, click on the "Observe" drop down.
-
Click on "Alerting". This is the OpenShift console’s view of the alert configuration.
The Alerting tab shows you information about currently configured and/or active alerts. You can see and do several things:
-
Filter alerts by their names.
-
Filter the alerts by their states. To fire, some alerts need a certain condition to be true for the duration of a timeout. If a condition of an alert is currently true, but the timeout has not been reached, such an alert is in the Pending state.
-
Alert name.
-
Description of an alert.
-
Current state of the alert and when the alert went into this state.
-
Value of the Severity label of the alert.
-
Actions you can do with the alert.
You’ll also note that you can click directly into the Alertmanager UI with a link that’s at the top of this page.
In addition to the Alerting screen, OpenShift’s built-in monitoring provides an interface to access metrics collected by Prometheus using the Prometheus Query Language (PromQL).
-
On the left hand side of the OpenShift Console, under the "Observe" section, click the link for "Metrics".
Let’s run a query to see the resources memory limit all pod definitions have defined.
-
Copy and paste the following query into the expression text box:
node_netstat_Udp6_OutDatagrams
-
Click the "Run Queries" button
-
You should now see a timeseries with a value in the list. You will also see a graph showing the value over the last time period (defaults to 30m).
Now let’s run a query to see the cpu usage for the entire cluster.
-
Click "Add Query"
-
Copy and paste the following query into the query text box:
cluster:cpu_usage_cores:sum
-
Click the "Run Queries" button
-
You should now see a timeseries with a value in the list. This value is the latest gathered value for the timeseries as defined by this query. You’ll also see that your new query got plotted on the same graph.
The metrics interface lets you run powerful queries on the information collected about your cluster.
You’ll also note that you can click directly into the Prometheus UI with a link that’s at the top of this page.
In addition to the Metrics UI, OpenShift monitoring provides a preconfigured "Dashboards" UI (aka "Grafana"). The purpose of these Dashboards is to show multiple metrics in easy to consume form graphed over time.
Click the "Dashboards" link under the "Observe" section on the left hand side.
Here you can click on the "Dashboard" arrow and select different metrics to visualize. Go ahead and play around with this view. Note that this is a visualization of the information that is provided by Prometheus.