-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tail sampling processor "count_traces_sampled" metric is inaccurate #25882
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Sounds reasonable, would you be open to sending in a PR? |
Sure, I'll get to it this weekend! |
How is the For example with two probabilistic policies like so, assuming perfect distribution:
If 100 traces passed through, would you expect the metric counts to read like so with each decision only counted once?
Or multiple times?
|
With my suggested code change, it would be the latter - every decision would increment every policy, either the Also, thinking about it, a total metric (without a |
While we (me and @jmsnll) were trying to work out a solution internally (to see the various output of the metric), we realized a potential bug in that the tail sampler does not check for uniqueness in the policy name. More concretely, if we use a config as below, where someone accidentally use identical name for two different policies receivers:
otlp:
protocols:
grpc:
processors:
tail_sampling:
decision_wait: 5s
policies:
- name: status-code-policy
type: status_code
status_code:
status_codes: [ERROR]
- name: status-code-policy
type: probabilistic
probabilistic:
sampling_percentage: 50
exporters:
logging:
service:
pipelines:
traces:
receivers: [otlp]
processors: [tail_sampling]
exporters: [logging]
then firing off 100 traces with otelcol_processor_tail_sampling_count_traces_sampled{policy="status-code-policy",sampled="true",service_name="otelcontribcol",service_version="0.84.0-dev"} 49 which is true as per the config but deceiving to anyone without looking at the config. As the label is taken from the policy context in One would argue that no engineer should make the mistake of having non-unique policy name. However, if a new (silence) tag say |
😱 , I'll send a quick PR to get that fixed. Thanks for pointing this out.
That should not be an assumption. |
…ampled` metric (#26724) **Description:** Previously, the calculation of `count_traces_sampled` was incorrect, resulting in all per-policy metric counts being the same. ``` count_traces_sampled{policy="policy-a", sampled="false"} 50 count_traces_sampled{policy="policy-a", sampled="true"} 50 count_traces_sampled{policy="policy-b", sampled="false"} 50 count_traces_sampled{policy="policy-b", sampled="true"} 50 ``` This issue stemmed from the metrics being incremented based on the final decision of trace sampling, rather than being attributed to the policies causing the sampling. With the fix implemented, the total sampling count can no longer be inferred from the metrics alone. To address this, a new metric, `global_count_traces_sampled`, was introduced to accurately capture the global count of sampled traces. ``` count_traces_sampled{policy="policy-a", sampled="false"} 50 count_traces_sampled{policy="policy-a", sampled="true"} 50 count_traces_sampled{policy="policy-b", sampled="false"} 24 count_traces_sampled{policy="policy-b", sampled="true"} 76 global_count_traces_sampled{sampled="false"} 24 global_count_traces_sampled{sampled="true"} 76 ``` Reusing the `count_traces_sampled` policy metric for this purpose would have either meant: 1. Leaving the policy field empty, which didn't feel very explicit. 2. Setting a global placeholder policy name (such as `final-decision`), which could then potentially collide with existing user-configured policy names without validation. **Link to tracking Issue:** Fixes #25882 **Testing:** Tested with various combinations of `probabilistic`, `latency`, `status_code` and attribute policies (including combinations of `and` and `composite` policies). No unit tests were added, open to suggestions for adding tests for metrics but I couldn't find any examples of it being done elsewhere. **Documentation:** No documentation changes.
…ampled` metric (open-telemetry#26724) **Description:** Previously, the calculation of `count_traces_sampled` was incorrect, resulting in all per-policy metric counts being the same. ``` count_traces_sampled{policy="policy-a", sampled="false"} 50 count_traces_sampled{policy="policy-a", sampled="true"} 50 count_traces_sampled{policy="policy-b", sampled="false"} 50 count_traces_sampled{policy="policy-b", sampled="true"} 50 ``` This issue stemmed from the metrics being incremented based on the final decision of trace sampling, rather than being attributed to the policies causing the sampling. With the fix implemented, the total sampling count can no longer be inferred from the metrics alone. To address this, a new metric, `global_count_traces_sampled`, was introduced to accurately capture the global count of sampled traces. ``` count_traces_sampled{policy="policy-a", sampled="false"} 50 count_traces_sampled{policy="policy-a", sampled="true"} 50 count_traces_sampled{policy="policy-b", sampled="false"} 24 count_traces_sampled{policy="policy-b", sampled="true"} 76 global_count_traces_sampled{sampled="false"} 24 global_count_traces_sampled{sampled="true"} 76 ``` Reusing the `count_traces_sampled` policy metric for this purpose would have either meant: 1. Leaving the policy field empty, which didn't feel very explicit. 2. Setting a global placeholder policy name (such as `final-decision`), which could then potentially collide with existing user-configured policy names without validation. **Link to tracking Issue:** Fixes open-telemetry#25882 **Testing:** Tested with various combinations of `probabilistic`, `latency`, `status_code` and attribute policies (including combinations of `and` and `composite` policies). No unit tests were added, open to suggestions for adding tests for metrics but I couldn't find any examples of it being done elsewhere. **Documentation:** No documentation changes.
Component(s)
processor/tailsampling
Describe the issue you're reporting
It looks like the "count_traces_sampled" metric of the tail sampling processor is inaccurate, or at least it's not calculated how I would expect.
Specifically, given two policies named
"policy1"
and"policy2"
, the metric values for{policy="policy1",sampled="false"}
and{policy="policy2",sampled="false"}
, and those for{policy="policy1",sampled="true"}
and{policy="policy2",sampled="true"}
are always equal, although I would expect them to be different, for different policies.Indeed, looking at the code, they are all calculated based on
finalDecision
, and not each policy's decision:opentelemetry-collector-contrib/processor/tailsamplingprocessor/processor.go
Lines 264 to 288 in 06f1f57
Why I think that's a bug, and not intended behavior, is that per-policy decisions are actually stored in
trace.Decisions[i]
during evaluation, but then never used anywhere (that I could find).So I think
should actually be
The text was updated successfully, but these errors were encountered: