-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Work needed to enable useOtelForInternalMetrics
by default
#7454
Comments
I've experimented with using metrics generated by an OTel collector running with the above-mentioned flag. I'm picking up work started by @paivagustavo. This is an experience report: I modified this example to configure the Prometheus receiver inside a collector for self observability like:
which is similar to the example configuration given in that receiver's README. To address the problem in #6403, I use The result, viewed by Prometheus has a mix of metrics produced from OC and OTel. This is an example from the OC path:
See that Here is an example from the OTel instrumentation with the associated target info.
The same metrics are also pushed via OTLP through an example pipeline, which I've printed. The resource comes through like:
Note that the original service name has been lost, overridden by the Prometheus scrape config's job name. The original service instance ID is also lost. The net.host.port has been inserted, but I'm not sure it should be in this case, because I'd prefer not to have lost the original instance ID. The I can't explain where the Next, I experimented with whether you can fix some of these issues using Prometheus relabeling rules, e.g., using
but this doesn't help much -- because anything you do at this step is before the logic that converts Prom data back to OTLP (which is where the problems arise). Here's what the OC metric looks like after OTLP conversion:
It appears that there are several issues, not all of them need to be addressed.
|
@jmacd I believe this PR will help w/ the third point. The issue is that currently both OC and OT are using prometheus as a bridge to allow components to continue using opencensus instrumentation. The change I'm proposing disables the OC configuration when |
We may need to address #7517 before closing this |
The |
This ensures backwards compatibility with the metrics generated via opencensus by default today. Part of open-telemetry#7454 Signed-off-by: Alex Boten <[email protected]>
This ensures backwards compatibility with the metrics generated via opencensus by default today. Part of #7454 Signed-off-by: Alex Boten <[email protected]>
With the PR for
With
I'm not sure that there's a good mechanism today for applying info from target_info as resource attributes. @dashpole pointed me to a spec change proposed to address this for prometheus exporters open-telemetry/opentelemetry-specification#3761 (thanks David!) I don't know what the right way to move forward with this change is with the current state. I did test the OTLP exporter for internal metrics and the resource attributes contained the service information I expected. The following is the output from the Go stdout exporter:
In other words, for end users that want OTLP for their collector metrics (like @jmacd's original scenario), this would be a good replacement for what is only available today w/ prometheus metrics (with OC). For users that want to continue using the Prometheus export functionality, I'm not sure what to recommend. |
Do you know why |
It's prefixed here https://github.com/open-telemetry/opentelemetry-collector/blob/445960b82fb4f4a979bfa5e6ce88aa754d22b92d/service/internal/proctelemetry/config.go#L206C1-L206C84 |
I think we can wrap Collect to add resource attributes as additional labels. That is what the prometheus exporter implements. In Collect, we would need to wrap each Metric, and override the Write function. There, we would need to add our additional labels. It would end up looking similar to https://github.com/prometheus/client_golang/blob/80d3f0b5b36c891016370423516e353b1a5f0a55/prometheus/metric.go#L172, which wraps another metric and adds exemplars to it. |
During the SIG call, I mentioned that Java implemented a double-emit so that an application could use both a new and an old semantic convention at the same time, to make it easier to migrate between versions without breaking other parts of the observability pipeline. That feature is described here: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/http/README.md |
@jpkrohling I guess this would raise the question of what's the correct thing to emit since there are a couple of differences between the current metrics and what's are the correct metrics:
Are you proposing that we have two endpoints that emit these metrics at the same time? I'm not sure what the deprecation story would be here, since it would be hard for users to know this change is coming. We saw changing metrics surprise users when we tried to change the default behaviour of counter metrics to include the Maybe the solution is to have a configuration option to use the legacy format? |
@dashpole I followed up on your question about the prefix, and made the change: #8988 I wanted to answer another question from the SIG meeting which was whether the prometheus receiver was able to use this
One more question that ^^^ raises is whether or not this is correct. As you can see there is both a |
Perhaps a feature flag can control the three possible scenarios? No target info + all resource attributes on metrics, target info and minimal set of resource attributes on metrics, target info and all resource attributes. The third scenario would help people migrate from the previous to the new one, as they can just ignore the extra resource attributes when building the integration. Once they are ready, they can move to the scenario 2, which is what will be supported for the long term. |
This is correct. Unfortunately, until prom supports utf-8, we can't auto-translate service_name -> service.name. I would recommend using an additional processor to overwrite service.name with service_name, etc. |
Are you suggesting a single gate that could be used for all 3 scenarios? If so then I would recommend we:
In other words, I would like to decouple people having to change the metrics they're using from enabling otel. That being said, it's a bit weird to enable otel with the old style metrics, but at least it unblocks that work |
@dashpole with open-telemetry/opentelemetry-specification#3761 close to merging, do you have any thoughts on how soon this could be adopted in the otel-go prometheus exporter? i would like to avoid adding workarounds in the collector code if possible. |
I would guess a week or two? I don't think it will be hard to implement. |
…etry#8965) This ensures backwards compatibility with the metrics generated via opencensus by default today. Part of open-telemetry#7454 Signed-off-by: Alex Boten <[email protected]>
The metrics are now consistent with the metrics produced by OpenCensus. We should move the featuregate forward. Note that the OpenTelemetry generated metrics includes grpc client/server metrics (for receivers/exporters that use grpc) and `target_info` metrics Fixes #7454 --------- Signed-off-by: Alex Boten <[email protected]>
The metrics are now consistent with the metrics produced by OpenCensus. We should move the featuregate forward. Note that the OpenTelemetry generated metrics includes grpc client/server metrics (for receivers/exporters that use grpc) and `target_info` metrics Fixes open-telemetry#7454 --------- Signed-off-by: Alex Boten <[email protected]>
This issue is to discuss the work needed to enable using the OTel SDK by default.
Related issues:
The text was updated successfully, but these errors were encountered: