-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
receiver/prometheus: add "up" metric for instances #2918
receiver/prometheus: add "up" metric for instances #2918
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2918 +/- ##
==========================================
- Coverage 92.06% 91.69% -0.38%
==========================================
Files 313 313
Lines 15439 15362 -77
==========================================
- Hits 14214 14086 -128
- Misses 817 870 +53
+ Partials 408 406 -2
Continue to review full report at Codecov.
|
0c2524f
to
0cc35c5
Compare
0cc35c5
to
1ffc5e3
Compare
Kindly cc-ing @Aneurysm9 @rakyll @alolita @bogdandrutu @brian-brazil |
1ffc5e3
to
0b9bc14
Compare
Nice! Maybe you can update the description saying it's fixing open-telemetry/prometheus-interoperability-spec#41. |
|
||
var tagInstance, _ = tag.NewKey("instance") | ||
|
||
var statUpStatus = stats.Int64("up", "Whether the endpoint is alive or not", stats.UnitDimensionless) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be part of the metrics returned by the prometheus receiver itself rather than a self-observability metric? I agree it could be implemented as either a self-obs metric or as an additional metric from the receiver, but there are a few advantages:
- Users probably expect to see an 'up' metric when they make a pipeline with a prometheus receiver, and don't expect to have to do additional things.
- I probably want to apply the same set of transformations to the 'up' metric that I want to apply to the rest of my prometheus metrics, since they should have the same resource labels (e.g. instance, job).
- It will make it easier for us to pass prom compliance, since we won't need to route self-obs metrics to the PRW exporter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a self-obs metric, will it still satisfy the prometheus compliance tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider to use the metrics package from opencensus instead of stats for this, if we only have one label and no other labels coming from tags or plan to change the encoding:
See https://github.com/census-instrumentation/opencensus-go/blob/master/metric/gauge.go
0b9bc14
to
5d67dcc
Compare
Make a receiver specific view that'll be registered and used to record the "up" status either "0.0" or "1.0" when an instance can't be scraped from or can be, respectively. This ensures that the collector can act as a passthrough for statuses and it currently outputs: # HELP up Whether the endpoint is alive or not # TYPE up gauge up{instance="0.0.0.0:8888"} 1 up{instance="localhost:9999"} 0 I did not take the approach of plainly sending up suffixed metric names. to recommend instead using relabelling inside the exporter itself like: - source_labels: [__name__] regex: "(.+)_up" target_label: "__name__" replacement: "up" because: * it'd apply ConstLabels on every *_up metric, only want "instance=$INSTANCE" * other exporters wouldn't be able to use the "up" metric as is if we inject rewrites Regardless of if we used a label rewrite, the end result would be the following: up{instance="localhost:8888",job="otlc"} up{exported_instance="0.0.0.0:9999",instance="localhost:8888",job="otlc"} up{exported_instance="0.0.0.0:1234",instance="localhost:8888",job="otlc"} which this change accomplishes without having to inject any label rewrites, but just by the new imports and upgrade of the prometheus exporter. Fixes open-telemetry/prometheus-interoperability-spec#8 Requires census-ecosystem/opencensus-go-exporter-prometheus#24
b0fd3bb
to
c9c4205
Compare
"up" should only be exported for Prometheus exporters, so essentially it'll need to be exported for the 3 different users:
And not be exported to all the general pipelines. I have an unmailed change that allows exporter/prometheus and exporter/prometheusremoteremotewrite to hook into receiver/prometheus so that gauges for up can be sent directly to the target exporters. |
what is "service/prometheus exporter"? |
Why only for prometheus exporters? For GKE's use-case, in which we scrape prometheus endpoints, and send to google cloud monitoring, we still would like to know when a prometheus endpoint is available. IMO it would even be useful to replicate the "up" metric for other scrape-based endpoints, such as when we scrape the kubelet's stats/summary endpoint. |
If prometheus receiver is a drop-in replacement for prometheus server, shouldn't this be emitted by receiver (may be optionally), rather than in exporter(s) ? |
@bogdandrutu when the service pipeline is started, it emits telemetry metrics as per opentelemetry-collector/service/telemetry.go Lines 91 to 106 in f4d33bc
@dashpole, the task was to implement a pass through for Prometheus server whereby the collector only exposes the up metric and delivers it to Prometheus server. However, if you need all other exporters to receive the up metric, then that even simplifies things much further and allows me to revert to prior versions.
@vishiy it is emitted by the Prometheus receiver, but consumed and relayed to the Prometheus exporters -- we are implementing a pass through. Prometheus server doesn't typically expose this metric for scraping either and uses it internally. However, you raise a great point that the Prometheus receiver should just generate "up" metric for all of them. |
@@ -93,29 +93,40 @@ func (b *metricBuilder) AddDataPoint(ls labels.Labels, t int64, v float64) error | |||
b.numTimeseries++ | |||
b.droppedTimeseries++ | |||
return errMetricNameNotFound | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen if we had a separate case for metricName "up" here, in which we didn't return (which filters out the "up" metric)? That seems simpler than filtering it out here, and adding a new internal metric above.
Hello, I started a debugging too with my initial issue. I can contribute on your pr my approach is a little more global and fix also internal scrape_ metrics. Don't merge it too quickly ! ^^ |
Other approach here : #3116 but I need some review / help to finish it cleanly. This approach avoid to record metrics manualy, avoid to introduce new concepts, and work for all internal metrics (not only |
@odeke-em what's the next step here? What do you suggest? |
For information, I don't understand why tests are not breaking here. With what I experimented at least Prometheus exporter should break because if it work, it should receive a new unexpected metric in its test case. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Superseded. |
Make a receiver specific view that'll be registered
and used to record the "up" status either "0.0" or "1.0"
when an instance can't be scraped from or can be, respectively.
This ensures that the collector can act as a passthrough
for statuses and it currently outputs:
I did not take the approach of plainly sending up suffixed metric names.
to recommend instead using relabelling inside the exporter itself like:
because:
inject rewrites
Regardless of if we used a label rewrite, the end result would be the
following:
which this change accomplishes without having to inject any label
rewrites, but just by the new imports and upgrade of the prometheus
exporter.
Fixes open-telemetry/prometheus-interoperability-spec#8
Requires census-ecosystem/opencensus-go-exporter-prometheus#24