Skip to content

Observe FastAPI app with three pillars of observability: Traces (Tempo), Metrics (Prometheus), Logs (Loki) on Grafana through OpenTelemetry and OpenMetrics.

Notifications You must be signed in to change notification settings

webscit/opentelemetry-demo-python

 
 

Repository files navigation

FastAPI with Opentelemetry Observability

Observe the FastAPI application with three pillars of observability on Grafana:

  1. Traces with Tempo and
  2. Metrics with Prometheus and Prometheus Python Client
  3. Logs with Loki

using only OpenTelemetry Python SDK. This is based on the awesome demo: https://github.com/blueswen/fastapi-observability

The FastAPI application can be configured to use one of the 3 approaches of observability with Opentelemetry using the environment variable OTEL_AUTO_INSTRUMENTATION_LEVEL:

  • 0: Manual instrumentation - the code is heavily modified to handle trace and metrics
  • 1: Programmatic instrumentation - the code is lightly modified to load the instrumentation provided by the Opentelemetry Python contrib project with fine tuning of the configuration.
  • 2: Zero-code instrumentation - the code is not modified and instrumentation will be injected at the start of the python process.

This was presented at the Python Rennes Meet-up on October 10th 2024. Here is the presentation and the video.

Presentation video at Python Meet up Rennes

The video is in French

Table of contents

Quick Start

  1. Install Loki Docker Driver

    docker plugin install grafana/loki-docker-driver:2.9.2 --alias loki --grant-all-permissions
  2. Start all services with docker-compose

    docker-compose up -d

    If got the error message Error response from daemon: error looking up logging plugin loki: plugin loki found but disabled, please run the following command to enable the plugin:

    docker plugin enable loki
  3. Send requests with siege and curl to the FastAPI app

    bash request-script.sh
    bash trace.sh

    Or you can use Locust to send requests:

    # install locust first with `pip install locust` if you don't have it
    locust -f locustfile.py --headless --users 10 --spawn-rate 1 -H http://localhost:8000

    Or you can send requests with k6:

    k6 run --vus 1 --duration 300s k6-script.js
  4. Check predefined dashboard FastAPI Observability on Grafana http://localhost:3000/ login with admin:admin

    Dashboard screenshot:

    FastAPI Monitoring Dashboard

    The dashboard is also available on Grafana Dashboards.

Note

This quick start present the fully manual instrumentation (level 0). For level 1 and 2, you will have to use the FastAPI Otel Observability dashboard as the meters will have a different name.

Explore with Grafana

Grafana provides a great solution, which could observe specific actions in service between traces, metrics, and logs through trace ID and exemplar.

Observability Correlations

Image Source: Grafana

Metrics to Traces

Get Trace ID from an exemplar in metrics, then query in Tempo.

Query: histogram_quantile(.99,sum(rate(fastapi_requests_duration_seconds_bucket{app_name="app-a", path!="/metrics"}[1m])) by(path, le))

Metrics to Traces

Traces to Logs

Get Trace ID and tags (here is compose.service) defined in Tempo data source from span, then query with Loki.

Traces to Logs

Logs to Traces

Get Trace ID from log (regex defined in Loki data source), then query in Tempo.

Logs to Traces

Detail

FastAPI Application

For a more complex scenario, we use three FastAPI applications with the same code in this demo. There is a cross-service action in /chain endpoint, which provides a good example of how to use OpenTelemetry SDK and how Grafana presents trace information.

Traces and Logs

We use OpenTelemetry Python SDK to send trace info with http to Tempo. Each request span contains other child spans when using OpenTelemetry instrumentation. The reason is that instrumentation will catch each internal asgi interaction (opentelemetry-python-contrib issue #831). If you want to get rid of the internal spans, there is a workaround in the same issue #831 by using a new OpenTelemetry middleware with two overridden methods for span processing.

We use OpenTelemetry Logging Instrumentation to override the logger format with another format with trace id and span id.

The following image shows the span info sent to Tempo and queried on Grafana. Trace span info provided by FastAPIInstrumentor with trace ID (17785b4c3d530b832fb28ede767c672c), span id(d410eb45cc61f442), service name(app-a), custom attributes(service.name=app-a, compose_service=app-a) and so on.

Span Information

Log format with trace id and span id, which is overridden by `LoggingInstrumentor``

%(asctime)s %(levelname)s [%(name)s] [%(filename)s:%(lineno)d] [trace_id=%(otelTraceID)s span_id=%(otelSpanID)s resource.service.name=%(otelServiceName)s] - %(message)s

The following image is what the logs look like.

Log With Trace ID And Span ID

Span Inject

If you want other services to use the same Trace ID, you have to use inject function to add current span information to the header. Because OpenTelemetry FastAPI instrumentation only takes care of the asgi app's request and response, it does not affect any other modules or actions like sending HTTP requests to other servers or function calls.

Alternatively, we can use the instrumentation library for HTTPX to instrument HTTPX. Following is the example of using OpenTelemetry HTTPX Instrumentation which will automatically inject trace info to the header.

import httpx
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor

HTTPXClientInstrumentor().instrument()

@app.get("/chain")
async def chain(response: Response):
    async with httpx.AsyncClient() as client:
        await client.get(f"http://localhost:8000/")
    async with httpx.AsyncClient() as client:
        await client.get(f"http://{TARGET_ONE_HOST}:8000/io_task")
    async with httpx.AsyncClient() as client:
        await client.get(f"http://{TARGET_TWO_HOST}:8000/cpu_task")

    return {"path": "/chain"}

This is what will happen in the zero-code instrumentation case.

Metrics

We use the OpenTelemetry Python SDK to generate metrics with exemplars.

Exemplars are not yet available in the stable version of the Python SDK (as of 1.27.0). But they will be in the next release.

The metrics are pushed directly to Prometheus as it is configured through a feature flag to support receiving OpenTelemetry metrics (--enable-feature=otlp-write-receiver). This induces some constraints on Prometheus capabilities.

To still use Prometheus in pull mode when using OpenTelemetry, a good solution is to add a OpenTelemetry to which your services will pushed metrics and from which Prometheus will pull them.

OpenTelemetry Instrumentation

There are two methods to add trace information to spans and logs using the OpenTelemetry Python SDK:

  1. Code-based Instrumentation: This involves adding trace information to spans, logs, and metrics using the OpenTelemetry Python SDK. It requires more coding effort but allows for the addition of exemplars to metrics. We employ this approach in this project.
  2. Zero-code Instrumentation: This method automatically instruments a Python application using instrumentation libraries, but only when the used frameworks and libraries are supported. It simplifies the process by eliminating the need for manual code changes. For more insights into zero-code instrumentation, refer to my other project, OpenTelemetry APM.

Prometheus - Metrics

Collects metrics from applications.

Grafana Data Source

Add an Exemplars which uses the value of trace_id label to create a Tempo link.

Grafana data source setting example:

Data Source of Prometheus: Exemplars

Grafana data sources config example:

name: Prometheus
type: prometheus
typeName: Prometheus
access: proxy
url: http://prometheus:9090
password: ''
user: ''
database: ''
basicAuth: false
isDefault: true
jsonData:
exemplarTraceIdDestinations:
   - datasourceUid: tempo
      name: trace_id
httpMethod: POST
readOnly: false
editable: true

Tempo - Traces

Receives spans from applications.

Grafana Data Source

Trace to logs setting:

  1. Data source: target log source
  2. Tags: key of tags or process level attributes from the trace, which will be log query criteria if the key exists in the trace
  3. Map tag names: Convert existing key of tags or process level attributes from trace to another key, then used as log query criteria. Use this feature when the values of the trace tag and log label are identical but the keys are different.

Grafana data source setting example:

Data Source of Tempo: Trace to logs

Grafana data sources config example:

name: Tempo
type: tempo
typeName: Tempo
access: proxy
url: http://tempo
password: ''
user: ''
database: ''
basicAuth: false
isDefault: false
jsonData:
nodeGraph:
   enabled: true
tracesToLogs:
   datasourceUid: loki
   filterBySpanID: false
   filterByTraceID: true
   mapTagNamesEnabled: false
   tags:
      - compose_service
readOnly: false
editable: true

Loki - Logs

Collect logs with Loki Docker Driver from all services.

Loki Docker Driver

  1. Use YAML anchor and alias feature to set logging options for each service.
  2. Set Loki Docker Driver options
    1. loki-url: loki service endpoint
    2. loki-pipeline-stages: processes multiline log from FastAPI application with multiline and regex stages (reference)
x-logging: &default-logging # anchor(&): 'default-logging' for defines a chunk of configuration
  driver: loki
  options:
    loki-url: 'http://localhost:3100/api/prom/push'
    loki-pipeline-stages: |
      - multiline:
          firstline: '^\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2}'
          max_wait_time: 3s
      - regex:
          expression: '^(?P<time>\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2},d{3}) (?P<message>(?s:.*))$$'
# Use $$ (double-dollar sign) when your configuration needs a literal dollar sign.

version: "3.4"

services:
   foo:
      image: foo
      logging: *default-logging # alias(*): refer to 'default-logging' chunk 

Grafana Data Source

Add a TraceID derived field to extract the trace id and create a Tempo link from the trace id.

Grafana data source setting example:

Data Source of Loki: Derived fields

Grafana data source config example:

name: Loki
type: loki
typeName: Loki
access: proxy
url: http://loki:3100
password: ''
user: ''
database: ''
basicAuth: false
isDefault: false
jsonData:
derivedFields:
   - datasourceUid: tempo
      matcherRegex: (?:trace_id)=(\w+)
      name: TraceID
      url: $${__value.raw}
      # Use $$ (double-dollar sign) when your configuration needs a literal dollar sign.
readOnly: false
editable: true

Grafana

  1. Add Prometheus, Tempo, and Loki to the data source with config file etc/grafana/datasource.yml.
  2. Load predefined dashboard with etc/dashboards.yaml and etc/dashboards/fastapi-observability.json.
# grafana in docker-compose.yaml
grafana:
   image: grafana/grafana:10.4.2
   volumes:
      - ./etc/grafana/:/etc/grafana/provisioning/datasources # data sources
      - ./etc/dashboards.yaml:/etc/grafana/provisioning/dashboards/dashboards.yaml # dashboard setting
      - ./etc/dashboards:/etc/grafana/dashboards # dashboard json files directory

About

Observe FastAPI app with three pillars of observability: Traces (Tempo), Metrics (Prometheus), Logs (Loki) on Grafana through OpenTelemetry and OpenMetrics.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 80.5%
  • Dockerfile 12.2%
  • Shell 4.1%
  • JavaScript 3.2%