From b15bae0515db11b29448be967d04476b6201b9a0 Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Tue, 9 Apr 2024 13:03:10 -0500 Subject: [PATCH 1/6] Add reference to prom operator install guide Signed-off-by: davidmirror-ops --- docs/deployment/configuration/monitoring.rst | 24 ++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/docs/deployment/configuration/monitoring.rst b/docs/deployment/configuration/monitoring.rst index 75bc89adc4..5960adc646 100644 --- a/docs/deployment/configuration/monitoring.rst +++ b/docs/deployment/configuration/monitoring.rst @@ -85,6 +85,30 @@ Use Published Dashboards to Monitor Flyte Deployment Flyte Backend is written in Golang and exposes stats using Prometheus. The stats are labeled with workflow, task, project & domain, wherever appropriate. + +To consume the dashboards, it's recommended to install and configure the Prometheus operator as described in `their docs `__. +This is especially true if you plan to use the `Service Monitor` provided by the `flyte-core `__ Helm chart. + +.. note:: + + Configure the Prometheus instance to use `ServiceMonitor` in namespaces other than `default` by configuring the following keys for the `prometheus` resources: + +.. code-block:: yaml + + spec: + serviceMonitorSelector: {} + serviceMonitorNamespaceSelector: {} + +.. note:: + + The above example configuration lets Prometheus use any `ServiceMonitor` in any namespace in the cluster. Adjust the configuration to reduce the scope if needed. + +Once you have installed and configured the Prometheus operator, enable the Service Monitors in the Helm chart by configuring the following keys in your `values` file: + +a. + + + The dashboards are divided into two types: - **User-facing dashboards**: Dashboards that can be used to triage/investigate/observe performance and characteristics of workflows and tasks. From 4140a308a5101841e4d44468a067b2c99763e041 Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Tue, 9 Apr 2024 13:50:33 -0500 Subject: [PATCH 2/6] Adds info about the three base dashboards Signed-off-by: davidmirror-ops --- docs/deployment/configuration/monitoring.rst | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/docs/deployment/configuration/monitoring.rst b/docs/deployment/configuration/monitoring.rst index 5960adc646..37de991d03 100644 --- a/docs/deployment/configuration/monitoring.rst +++ b/docs/deployment/configuration/monitoring.rst @@ -85,6 +85,13 @@ Use Published Dashboards to Monitor Flyte Deployment Flyte Backend is written in Golang and exposes stats using Prometheus. The stats are labeled with workflow, task, project & domain, wherever appropriate. +Both `flyteadmin` and `flytepropeller` are instrumented to expose metrics. To visualize these metrics, Flyte provides three Grafana dashboards, each with a different focus: + +- User: overview of workflow execution status +- Flyte Propeller: execution engine performance and status +- Flyte Admin: API-level Monitoring + +You can `generate the dashboards `__, with the resulting JSON files located at `deployment/stats/prometheus`; or download them from the `Grafana marketplace `__. To consume the dashboards, it's recommended to install and configure the Prometheus operator as described in `their docs `__. This is especially true if you plan to use the `Service Monitor` provided by the `flyte-core `__ Helm chart. From 2fb76fed83d1fed81396a58fa672462d2849fc56 Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Tue, 9 Apr 2024 14:22:11 -0500 Subject: [PATCH 3/6] Adds instructions to enable SMs Signed-off-by: davidmirror-ops --- docs/deployment/configuration/monitoring.rst | 44 ++++++++++++-------- 1 file changed, 26 insertions(+), 18 deletions(-) diff --git a/docs/deployment/configuration/monitoring.rst b/docs/deployment/configuration/monitoring.rst index 37de991d03..9f642e1630 100644 --- a/docs/deployment/configuration/monitoring.rst +++ b/docs/deployment/configuration/monitoring.rst @@ -87,18 +87,26 @@ Flyte Backend is written in Golang and exposes stats using Prometheus. The stats Both `flyteadmin` and `flytepropeller` are instrumented to expose metrics. To visualize these metrics, Flyte provides three Grafana dashboards, each with a different focus: -- User: overview of workflow execution status -- Flyte Propeller: execution engine performance and status -- Flyte Admin: API-level Monitoring +- **User-facing dashboards**: Dashboards that can be used to triage/investigate/observe performance and characteristics of workflows and tasks. + The user-facing dashboard is published under ID `13980 `__ in the Grafana marketplace. + +- **System Dashboards**: Dashboards that are useful for the system maintainer to investigate the status and performance of their Flyte deployments. These are further divided into: + - `DataPlane/FlytePropeller `__: execution engine status and performance. + - `ControlPlane/Flyteadmin`__: API-level monitoring. + +The corresponding JSON files for each dashboard are also located at ``deployment/stats/prometheus``. + +.. note:: -You can `generate the dashboards `__, with the resulting JSON files located at `deployment/stats/prometheus`; or download them from the `Grafana marketplace `__. + The above mentioned are basic dashboards and do no include all the metrics exposed by Flyte. + Feel free to use the scripts provided `here `__ to improve and -hopefully- contribute the improved dashboards. To consume the dashboards, it's recommended to install and configure the Prometheus operator as described in `their docs `__. -This is especially true if you plan to use the `Service Monitor` provided by the `flyte-core `__ Helm chart. +This is especially true if you plan to use the Service Monitors provided by the `flyte-core `__ Helm chart. .. note:: - Configure the Prometheus instance to use `ServiceMonitor` in namespaces other than `default` by configuring the following keys for the `prometheus` resources: + Configure the Prometheus instance to use Service Monitors in the namespace where Flyte is running, configuring the following keys in the ``prometheus`` resource: .. code-block:: yaml @@ -108,23 +116,23 @@ This is especially true if you plan to use the `Service Monitor` provided by the .. note:: - The above example configuration lets Prometheus use any `ServiceMonitor` in any namespace in the cluster. Adjust the configuration to reduce the scope if needed. + The above example configuration lets Prometheus use any ``ServiceMonitor`` in any namespace in the cluster. Adjust the configuration to reduce the scope if needed. Once you have installed and configured the Prometheus operator, enable the Service Monitors in the Helm chart by configuring the following keys in your `values` file: -a. - +.. code-block:: yaml + flyteadmin: + serviceMonitor: + enabled: true + + flytepropeller: + serviceMonitor: + enabled: true -The dashboards are divided into two types: +.. note:: -- **User-facing dashboards**: Dashboards that can be used to triage/investigate/observe performance and characteristics of workflows and tasks. - The user-facing dashboard is published under Grafana marketplace ID `13980 `__. + By default, the ``ServiceMonitor`` is configured with a ``scrapeTimeout`` of 30s and and ``interval`` of 60s. You can customize these values if needed. -- **System Dashboards**: Dashboards that are useful for the system maintainer to maintain their Flyte deployments. These are further divided into: - - DataPlane/FlytePropeller dashboards published @ `13979 `__ - - ControlPlane/Flyteadmin dashboards published @ `13981 `__ +With the above configuration in place you should be able to import the dashboards in your Grafana instance. -The above mentioned are basic dashboards and do no include all the metrics exposed by Flyte. -Please help us improve the dashboards by contributing to them 🙏. -Refer to the build scripts `here `__. \ No newline at end of file From 3f519693f0952aeeddf904b59beb7766723e1165 Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Tue, 9 Apr 2024 16:45:13 -0500 Subject: [PATCH 4/6] Incorporate reviews Signed-off-by: davidmirror-ops --- docs/deployment/configuration/monitoring.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/deployment/configuration/monitoring.rst b/docs/deployment/configuration/monitoring.rst index 9f642e1630..6496edeba1 100644 --- a/docs/deployment/configuration/monitoring.rst +++ b/docs/deployment/configuration/monitoring.rst @@ -98,10 +98,10 @@ The corresponding JSON files for each dashboard are also located at ``deployment .. note:: - The above mentioned are basic dashboards and do no include all the metrics exposed by Flyte. + The dashboards are basic dashboards and do not include all the metrics exposed by Flyte. Feel free to use the scripts provided `here `__ to improve and -hopefully- contribute the improved dashboards. -To consume the dashboards, it's recommended to install and configure the Prometheus operator as described in `their docs `__. +To consume the dashboards, we recommend installing and configuring the Prometheus operator as described in `their docs `__. This is especially true if you plan to use the Service Monitors provided by the `flyte-core `__ Helm chart. .. note:: From 5e29f2a66642dc8e25cd4c2d015268d6f7479cd8 Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Tue, 9 Apr 2024 16:48:05 -0500 Subject: [PATCH 5/6] Minor fixes Signed-off-by: davidmirror-ops --- docs/deployment/configuration/monitoring.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/deployment/configuration/monitoring.rst b/docs/deployment/configuration/monitoring.rst index 6496edeba1..0653f15cf6 100644 --- a/docs/deployment/configuration/monitoring.rst +++ b/docs/deployment/configuration/monitoring.rst @@ -85,14 +85,14 @@ Use Published Dashboards to Monitor Flyte Deployment Flyte Backend is written in Golang and exposes stats using Prometheus. The stats are labeled with workflow, task, project & domain, wherever appropriate. -Both `flyteadmin` and `flytepropeller` are instrumented to expose metrics. To visualize these metrics, Flyte provides three Grafana dashboards, each with a different focus: +Both ``flyteadmin`` and ``flytepropeller`` are instrumented to expose metrics. To visualize these metrics, Flyte provides three Grafana dashboards, each with a different focus: - **User-facing dashboards**: Dashboards that can be used to triage/investigate/observe performance and characteristics of workflows and tasks. The user-facing dashboard is published under ID `13980 `__ in the Grafana marketplace. - **System Dashboards**: Dashboards that are useful for the system maintainer to investigate the status and performance of their Flyte deployments. These are further divided into: - `DataPlane/FlytePropeller `__: execution engine status and performance. - - `ControlPlane/Flyteadmin`__: API-level monitoring. + - `ControlPlane/Flyteadmin `__: API-level monitoring. The corresponding JSON files for each dashboard are also located at ``deployment/stats/prometheus``. @@ -106,7 +106,7 @@ This is especially true if you plan to use the Service Monitors provided by the .. note:: - Configure the Prometheus instance to use Service Monitors in the namespace where Flyte is running, configuring the following keys in the ``prometheus`` resource: + Enable the Prometheus instance to use Service Monitors in the namespace where Flyte is running, configuring the following keys in the ``prometheus`` resource: .. code-block:: yaml @@ -118,7 +118,7 @@ This is especially true if you plan to use the Service Monitors provided by the The above example configuration lets Prometheus use any ``ServiceMonitor`` in any namespace in the cluster. Adjust the configuration to reduce the scope if needed. -Once you have installed and configured the Prometheus operator, enable the Service Monitors in the Helm chart by configuring the following keys in your `values` file: +Once you have installed and configured the Prometheus operator, enable the Service Monitors in the Helm chart by configuring the following keys in your ``values`` file: .. code-block:: yaml From 5d0b566eb5f5a385353a8c1141eaa4e400615a33 Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Wed, 10 Apr 2024 11:25:26 -0500 Subject: [PATCH 6/6] Improve format for steps Signed-off-by: davidmirror-ops --- docs/deployment/configuration/monitoring.rst | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/docs/deployment/configuration/monitoring.rst b/docs/deployment/configuration/monitoring.rst index 0653f15cf6..7b0d9ddc0b 100644 --- a/docs/deployment/configuration/monitoring.rst +++ b/docs/deployment/configuration/monitoring.rst @@ -101,12 +101,13 @@ The corresponding JSON files for each dashboard are also located at ``deployment The dashboards are basic dashboards and do not include all the metrics exposed by Flyte. Feel free to use the scripts provided `here `__ to improve and -hopefully- contribute the improved dashboards. -To consume the dashboards, we recommend installing and configuring the Prometheus operator as described in `their docs `__. -This is especially true if you plan to use the Service Monitors provided by the `flyte-core `__ Helm chart. +How to use the dashboards +~~~~~~~~~~~~~~~~~~~~~~~~~ -.. note:: +1. We recommend installing and configuring the Prometheus operator as described in `their docs `__. +This is especially true if you plan to use the Service Monitors provided by the `flyte-core `__ Helm chart. - Enable the Prometheus instance to use Service Monitors in the namespace where Flyte is running, configuring the following keys in the ``prometheus`` resource: +2. Enable the Prometheus instance to use Service Monitors in the namespace where Flyte is running, configuring the following keys in the ``prometheus`` resource: .. code-block:: yaml @@ -118,7 +119,7 @@ This is especially true if you plan to use the Service Monitors provided by the The above example configuration lets Prometheus use any ``ServiceMonitor`` in any namespace in the cluster. Adjust the configuration to reduce the scope if needed. -Once you have installed and configured the Prometheus operator, enable the Service Monitors in the Helm chart by configuring the following keys in your ``values`` file: +3. Once you have installed and configured the Prometheus operator, enable the Service Monitors in the Helm chart by configuring the following keys in your ``values`` file: .. code-block:: yaml