Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics for better visibility around hedged requests #760

Closed
annanay25 opened this issue Jun 14, 2021 · 3 comments · Fixed by #790
Closed

Add metrics for better visibility around hedged requests #760

annanay25 opened this issue Jun 14, 2021 · 3 comments · Fixed by #790
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@annanay25
Copy link
Contributor

#750 added Hedged requests to GCS/S3/Azure backends.

There was a discussion on the PR to add metrics for tracking the performance of hedged requests, and the corresponding stats have been introduced in the hedgedhttp library in the latest release - https://github.com/cristalhq/hedgedhttp/releases/tag/v0.6.0

Use these stats to generate prometheus metrics for each backend.

@annanay25 annanay25 added good first issue Good for newcomers help wanted Extra attention is needed labels Jun 14, 2021
@josephwoodward
Copy link
Contributor

Use these stats to generate prometheus metrics for each backend.

What's required for this one? Is the plan to use the metrics coming out of v0.6.0 of the hedge requests client in place of the timers we're manually capturing?

@joe-elliott
Copy link
Member

Thanks for the interest @josephwoodward. Based on the way hedgedhttp added metrics it seems like the most straightforward way to do this would be to start a goroutine that wakes up every 10 seconds or so calls Stats.Snapshot() and records the desired metrics as Prometheus metrics.

I would also see this as an opportunity to consolidate our client metrics and hedged http code. We basically have the same code copy/pasted 3 times for instrumentation.

Personally the only metric I'm interested in is the number of "extra" requests that were required due to hedging.

@josephwoodward
Copy link
Contributor

@joe-elliott I'm happy to take a stab at this. Before I invest too much time on the solution I'm interested to know if this kind of approach is the direction you were thinking? https://github.com/grafana/tempo/pull/790/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants