-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds Prometheus Remote Write #388
base: master
Are you sure you want to change the base?
Conversation
This removes tests for valid metrics from the TestInvalidMetricsLexer test. It also codifies handling of "foo:" and ":foo" for tags as valid tags. This may not be the design we want, but it's the code we have, so now we enforce it with tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some questions on things, I would like to enforce compile time interface checks where possible.
Prometheus Remote Writer Backend | ||
-------------------------------- | ||
The `promremotewriter` backend supports sending metrics to a Prometheus Remote Writer compatible backend. It currently | ||
drops events. At present there is no authentication supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should be expected with dropped events?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prometheus doesn't do events, they need to be managed by something external (ie, grafana). So that will probably end up as a grafana-events
backend which drops metrics or something. Haven't really thought about it too hard.
29.0.2 | ||
------ | ||
- New Relic backend handles infinities and NaN better |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change part of this change set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it was an "unreleased" change in 29.0.2, this just documents when the behavior went in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No wokkas
@@ -0,0 +1,229 @@ | |||
// Code generated by protoc-gen-go. DO NOT EDIT. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it okay to check in the auto generated code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, we do it already with the other PB stuff. It simplifies the need to deal with protoc.
@@ -0,0 +1,29 @@ | |||
syntax = "proto3"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reference that this proto file was built from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, came from the prom remote write spec, I can add a ref to it.
- this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please, just in case we need to chase it up.
) | ||
|
||
// flush represents a send operation. | ||
type flush struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this struct implement an interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope
} | ||
|
||
func (f *flush) maybeFlush() { | ||
if uint(len(f.writeRequest.Timeseries))+20 >= f.metricsPerBatch { // flush before it reaches max size and grows the slice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the magic +20?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's inherited from datadog, but it's meant to be the maximum cardinality of a single metric. That is, a timer is going to have like 10 metrics it emits. 20 just gives it a buffer.
The intent is len(...) < metricsPerBatch
, rather than len(...) > metricsPerBatch+epsilon
, because if there's a limit on the backend, then it might be a hard limit, so going 1 over is a hard fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could I ask you to make the number a constant with the file to make it easier in future to understand the code a bit more?
const ( | ||
// BackendName is the name of this backend. | ||
BackendName = "promremotewriter" | ||
defaultUserAgent = "gostatsd" // TODO: Add version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth raising an issue for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, pretty sure it's a theme.
) | ||
|
||
// Client represents a Prometheus Remote Writer client. | ||
type Client struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an interface that this implements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, gostatsd.Backend
. It's not formally checked, but the code won't compile if it doesn't implement it (NewClientFromViper
will fail its return type)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not worth doing the compiler time interface check within the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do, NewClientFromViper
will fail if it doesn't match.
now := clock.FromContext(ctx).Now().UnixNano() / 1_000_000 | ||
prw.processMetrics(now, metrics, func(writeReq *pb.PromWriteRequest) { | ||
atomic.AddUint64(&prw.batchesCreated, 1) | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any concern if this is called mutliple times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love it, but SendMetricsAsync needs to complete fast - If it blocks, everything blocks. MetricFlusher.flushData ensures receiving of data to continue, and enforces that we won't initiate a second flush before the first one completes.
bufCompressed := snappy.Encode(bufBorrowed, raw) | ||
if cap(bufBorrowed) < cap(bufCompressed) { | ||
// it overflowed the buffer and had to allocate, so we'll keep the larger buffer for next time | ||
*buffer = bytes.NewBuffer(bufCompressed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth looking at https://pkg.go.dev/sync#Pool for this functionality instead of doing the memory management for go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
snappy
is creating/growing the byte array if needed, the best we can do is detect if it needed to grow, and re-use it. The bytes.Buffer
is used in the semaphore, so we'll reach a local maxima and then stop allocating more data. There's not a large amount of byte slices allocated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
Once again we need to ignore coveralls, I think it's just sad because there's new code without sufficient coverage to actively increase the %. |
Oh hey I never got this merged. I'll clean it up later. |
Soz |
See #388 (comment) |
Probably easiest to just look at 6ca4ea3, it's based off the Datadog backend, with a lot of reused stuff.