Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus: Unable to export data after restarting the exporter #4032

Closed
mauriciovasquezbernal opened this issue Apr 25, 2023 · 1 comment · Fixed by #4648
Closed

prometheus: Unable to export data after restarting the exporter #4032

mauriciovasquezbernal opened this issue Apr 25, 2023 · 1 comment · Fixed by #4648
Assignees
Labels
area:metrics Part of OpenTelemetry Metrics pkg:exporter:prometheus Related to the Prometheus exporter package
Milestone

Comments

@mauriciovasquezbernal
Copy link
Member

I'm implementing an application that uses otel to export metrics to Prometheus. The user should be able to "stop" and "start" the metrics collection. Unfortunately, I'm running into an issue with the Prometheus exporter, when the metrics are started a second time they stop working and the collected metric xxxx was collected before with the same name and label values error is reported. I checked and it seems I shutdown all components in the right way. The following is a minimal reproducer:

package main

import (
	"context"
	"fmt"
	"net/http"
	"os"
	"os/signal"
	"runtime"
	"time"

	"github.com/prometheus/client_golang/prometheus/promhttp"
	"go.opentelemetry.io/otel/exporters/prometheus"
	sdkmetric "go.opentelemetry.io/otel/sdk/metric"
)

const (
	metricsPath   = "/metrics"
	listenAddress = "127.0.0.1:2223"
)

func run(ctx context.Context) error {
	//exporter, err := stdoutmetric.New()
	exporter, err := prometheus.New()
	if err != nil {
		return fmt.Errorf("initialize prometheus exporter: %s", err)
	}
	defer func() {
		if err := exporter.Shutdown(context.TODO()); err != nil {
			fmt.Printf("shutdown exporter: %s\n", err)
		}
	}()

	//reader := metric.NewPeriodicReader(exporter)
	reader := exporter
	meterProvider := sdkmetric.NewMeterProvider(sdkmetric.WithReader(reader))
	defer func() {
		if err := meterProvider.Shutdown(context.TODO()); err != nil {
			fmt.Printf("shutdown meter provider: %s\n", err)
		}
	}()

	meter := meterProvider.Meter("inspektor-gadget")

	mux := http.NewServeMux()
	handler := promhttp.Handler()
	mux.Handle(metricsPath, handler)

	server := &http.Server{Addr: listenAddress, Handler: mux}
	defer server.Close()

	// fake counter
	otelCounter, err := meter.Int64Counter("fake")
	if err != nil {
		return err
	}

	go func() {
		err := server.ListenAndServe()
		if err != nil && err != http.ErrServerClosed {
			fmt.Printf("error serving http: %s", err)
			return
		}
	}()

	ticker := time.NewTicker(1 * time.Second)

	for {
		select {
		case <-ctx.Done():
			fmt.Printf("context cancelled, exiting\n")
			return nil
		case <-ticker.C:
			otelCounter.Add(ctx, 1)
		}
	}
}

func main() {
	for i := 0; i < 2; i++ {
		ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt)
		fmt.Printf("running\n")
		err := run(ctx)
		if err != nil {
			fmt.Printf("error: %s\n", err)
		}
		cancel()

		runtime.GC()

		fmt.Printf("waiting 2 seconds before restarting\n")
		time.Sleep(2 * time.Second)
	}
}

I run the program in one terminal:

$ go run foo.go 
running

In another terminal, I query the metrics endpoint:

# metrics work time the first execution
$ curl http://127.0.0.1:2223/metrics | grep fake
  fake_total{otel_scope_name="inspektor-gadget",otel_scope_version=""} 29

Now, I hit ctrl+c on the first terminal to force this to restart the metrics collection

$ go run foo.go 
running
...

^Ccontext cancelled, exiting
shutdown exporter: reader is shutdown
waiting 2 seconds before restarting
running

Now the metrics don't work anymore:

$ curl http://127.0.0.1:2223/metrics 
An error has occurred while serving metrics:

collected metric "target_info" { label:<name:"service_name" value:"unknown_service:cosa" > label:<name:"telemetry_sdk_language" value:"go" > label:<name:"telemetry_sdk_name" value:"opentelemetry" > label:<name:"telemetry_sdk_version" value:"1.14.0" > gauge:<value:1 > } was collected before with the same name and label values

and the fist terminal prints

2023/04/20 12:53:11 reader is shutdown

I tried the stdout exporter instead of Prometheus and it seems to work fine.

After asking on Slack, @Aneurysm9 suggested to use an explicit register:

package main

import (
	"context"
	"fmt"
	"net/http"
	"os"
	"os/signal"
	"runtime"
	"time"

	prometheusapi "github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promhttp"
	"go.opentelemetry.io/otel/exporters/prometheus"
	sdkmetric "go.opentelemetry.io/otel/sdk/metric"
)

const (
	metricsPath   = "/metrics"
	listenAddress = "127.0.0.1:2223"
)

func run(ctx context.Context) error {
	//exporter, err := stdoutmetric.New()
	var opts []prometheus.Option
	register := prometheusapi.NewRegistry()
	opts = append(opts, prometheus.WithRegisterer(register))
	exporter, err := prometheus.New(opts...)
	if err != nil {
		return fmt.Errorf("initialize prometheus exporter: %s", err)
	}
	defer func() {
		if err := exporter.Shutdown(context.TODO()); err != nil {
			fmt.Printf("shutdown exporter: %s\n", err)
		}
	}()

	//reader := metric.NewPeriodicReader(exporter)
	reader := exporter
	meterProvider := sdkmetric.NewMeterProvider(sdkmetric.WithReader(reader))
	defer func() {
		if err := meterProvider.Shutdown(context.TODO()); err != nil {
			fmt.Printf("shutdown meter provider: %s\n", err)
		}
	}()

	meter := meterProvider.Meter("inspektor-gadget")

	mux := http.NewServeMux()
	handler := promhttp.HandlerFor(register, promhttp.HandlerOpts{})
	mux.Handle(metricsPath, handler)

	server := &http.Server{Addr: listenAddress, Handler: mux}
	defer server.Close()

	// fake counter
	otelCounter, err := meter.Int64Counter("fake")
	if err != nil {
		return err
	}

	go func() {
		err := server.ListenAndServe()
		if err != nil && err != http.ErrServerClosed {
			fmt.Printf("error serving http: %s", err)
			return
		}
	}()

	ticker := time.NewTicker(1 * time.Second)

	for {
		select {
		case <-ctx.Done():
			fmt.Printf("context cancelled, exiting\n")
			return nil
		case <-ticker.C:
			otelCounter.Add(ctx, 1)
		}
	}
}

func main() {
	for i := 0; i < 2; i++ {
		ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt)
		fmt.Printf("running\n")
		err := run(ctx)
		if err != nil {
			fmt.Printf("error: %s\n", err)
		}
		cancel()

		runtime.GC()

		fmt.Printf("waiting 2 seconds before restarting\n")
		time.Sleep(2 * time.Second)
	}
}

With this modification the program works as expected. However, it's not clear if the behavior described above is the correct one.

ref https://cloud-native.slack.com/archives/C01NPAXACKT/p1682013380206509

@Aneurysm9 Aneurysm9 added area:metrics Part of OpenTelemetry Metrics pkg:exporter:prometheus Related to the Prometheus exporter package labels Apr 25, 2023
@dashpole
Copy link
Contributor

dashpole commented Oct 2, 2023

To fix this I believe we would need to make sure Collect() returns if reader.Collect returns metric.ErrReaderShutdown here:

err := c.reader.Collect(context.TODO(), &metrics)
if err != nil {
otel.Handle(err)
if err == metric.ErrReaderNotRegistered {
return
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:metrics Part of OpenTelemetry Metrics pkg:exporter:prometheus Related to the Prometheus exporter package
Development

Successfully merging a pull request may close this issue.

4 participants