Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Prometheus Remote Write #388

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Adds Prometheus Remote Write #388

wants to merge 5 commits into from

Conversation

tiedotguy
Copy link
Collaborator

Probably easiest to just look at 6ca4ea3, it's based off the Datadog backend, with a lot of reused stuff.

This removes tests for valid metrics from the TestInvalidMetricsLexer test.
It also codifies handling of "foo:" and ":foo" for tags as valid tags.

This may not be the design we want, but it's the code we have, so now we
enforce it with tests.
Copy link
Contributor

@MovieStoreGuy MovieStoreGuy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some questions on things, I would like to enforce compile time interface checks where possible.

Prometheus Remote Writer Backend
--------------------------------
The `promremotewriter` backend supports sending metrics to a Prometheus Remote Writer compatible backend. It currently
drops events. At present there is no authentication supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should be expected with dropped events?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prometheus doesn't do events, they need to be managed by something external (ie, grafana). So that will probably end up as a grafana-events backend which drops metrics or something. Haven't really thought about it too hard.

29.0.2
------
- New Relic backend handles infinities and NaN better
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change part of this change set?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it was an "unreleased" change in 29.0.2, this just documents when the behavior went in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No wokkas

@@ -0,0 +1,229 @@
// Code generated by protoc-gen-go. DO NOT EDIT.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay to check in the auto generated code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we do it already with the other PB stuff. It simplifies the need to deal with protoc.

@@ -0,0 +1,29 @@
syntax = "proto3";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reference that this proto file was built from?

Copy link
Collaborator Author

@tiedotguy tiedotguy Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, came from the prom remote write spec, I can add a ref to it.

  • this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please, just in case we need to chase it up.

)

// flush represents a send operation.
type flush struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this struct implement an interface?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope

}

func (f *flush) maybeFlush() {
if uint(len(f.writeRequest.Timeseries))+20 >= f.metricsPerBatch { // flush before it reaches max size and grows the slice
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the magic +20?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's inherited from datadog, but it's meant to be the maximum cardinality of a single metric. That is, a timer is going to have like 10 metrics it emits. 20 just gives it a buffer.

The intent is len(...) < metricsPerBatch, rather than len(...) > metricsPerBatch+epsilon, because if there's a limit on the backend, then it might be a hard limit, so going 1 over is a hard fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could I ask you to make the number a constant with the file to make it easier in future to understand the code a bit more?

const (
// BackendName is the name of this backend.
BackendName = "promremotewriter"
defaultUserAgent = "gostatsd" // TODO: Add version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth raising an issue for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, pretty sure it's a theme.

)

// Client represents a Prometheus Remote Writer client.
type Client struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an interface that this implements?

Copy link
Collaborator Author

@tiedotguy tiedotguy Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, gostatsd.Backend. It's not formally checked, but the code won't compile if it doesn't implement it (NewClientFromViper will fail its return type)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not worth doing the compiler time interface check within the file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do, NewClientFromViper will fail if it doesn't match.

now := clock.FromContext(ctx).Now().UnixNano() / 1_000_000
prw.processMetrics(now, metrics, func(writeReq *pb.PromWriteRequest) {
atomic.AddUint64(&prw.batchesCreated, 1)
go func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any concern if this is called mutliple times?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love it, but SendMetricsAsync needs to complete fast - If it blocks, everything blocks. MetricFlusher.flushData ensures receiving of data to continue, and enforces that we won't initiate a second flush before the first one completes.

bufCompressed := snappy.Encode(bufBorrowed, raw)
if cap(bufBorrowed) < cap(bufCompressed) {
// it overflowed the buffer and had to allocate, so we'll keep the larger buffer for next time
*buffer = bytes.NewBuffer(bufCompressed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth looking at https://pkg.go.dev/sync#Pool for this functionality instead of doing the memory management for go?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snappy is creating/growing the byte array if needed, the best we can do is detect if it needed to grow, and re-use it. The bytes.Buffer is used in the semaphore, so we'll reach a local maxima and then stop allocating more data. There's not a large amount of byte slices allocated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

MovieStoreGuy
MovieStoreGuy previously approved these changes Sep 6, 2021
@tiedotguy
Copy link
Collaborator Author

Once again we need to ignore coveralls, I think it's just sad because there's new code without sufficient coverage to actively increase the %.

@tiedotguy
Copy link
Collaborator Author

Oh hey I never got this merged. I'll clean it up later.

@MovieStoreGuy
Copy link
Contributor

Soz

@tiedotguy
Copy link
Collaborator Author

See #388 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants