Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional default latency buckets for spanmetrics #1593

Merged
merged 4 commits into from
Jul 27, 2022

Conversation

fredr
Copy link
Contributor

@fredr fredr commented Jul 21, 2022

What this PR does:
Changes the default spanmetric latency bucket sizes to match the default sizes in https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/spanmetricsprocessor

It will be hard to size these default buckets to a size that fits all systems, but I think wider is better than narrower for a default. The current default maxes out at 4 seconds which imo is a bit low.

Checklist

  • Tests updated Tests pass
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@CLAassistant
Copy link

CLAassistant commented Jul 21, 2022

CLA assistant check
All committers have signed the CLA.

@joe-elliott
Copy link
Member

I am fine with this change. I think most people running default Tempo are doing so at low volumes and will benefit from the additional buckets. Pinging @kvrhdn and @mapno for thoughts. I am good with merging this if you are.

Is there a specific reason to not use exponential buckets as generated by Prometheus? If we just change the line to:

  cfg.HistogramBuckets = prometheus.ExponentialBuckets(0.002, 2, 14) // 12 -> 14

then we will get buckets:

  [0.002 0.004 0.008 0.016 0.032 0.064 0.128 0.256 0.512 1.024 2.048 4.096 8.192 16.384]

@fredr
Copy link
Contributor Author

fredr commented Jul 21, 2022

Is there a specific reason to not use exponential buckets as generated by Prometheus?

I copied the default buckets from https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/spanmetricsprocessor/processor.go#L51 thinking that there might be a value in having the same. But using the exponential buckets is definitely cleaner. I'll update the pr

Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I am fine with this change and will approve, but will leave to @kvrhdn or @mapno to merge. They might have reasons to leave this out.

Copy link
Member

@mapno mapno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with both options. Merging!

@mapno
Copy link
Member

mapno commented Jul 22, 2022

Agh, there are tests in the registry (registry_test.go) failing due to this change. Those will need to be fixed before we can merge.

@fredr
Copy link
Contributor Author

fredr commented Jul 22, 2022

Agh, there are tests in the registry (registry_test.go) failing due to this change. Those will need to be fixed before we can merge.

I've ran the whole test suite (make test) and ran TestManagedRegistry_removeStaleSeries locally on this branch, and it all pass. Could this be a flake? would it possible to re-run the test ci?

@fredr
Copy link
Contributor Author

fredr commented Jul 22, 2022

It failed again, but this time it was github.com/grafana/tempo/modules/distributor that didn't go through. Think that is unrelated to this change, but let me know if I should dig into it. (it also passes locally)

Copy link
Member

@yvrhdn yvrhdn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm also fine with this. Should we make the changelog more explicit about the additional buckets? This will increase active series of people using the defaults.
We aren't making the buckets itself wider, we are adding 2 more buckets.

CHANGELOG.md Outdated Show resolved Hide resolved
@fredr fredr changed the title Wider default latency buckets for spanmetrics Additional default latency buckets for spanmetrics Jul 26, 2022
@mapno
Copy link
Member

mapno commented Jul 27, 2022

Thanks for the patience. Merging!

@mapno mapno merged commit 150eeea into grafana:main Jul 27, 2022
@fredr fredr deleted the fredr/spanmetrics-default-buckets branch July 27, 2022 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants