Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src/registry: Use dynamic dispatch and remove type parameter M #100

Closed
wants to merge 2 commits into from

Conversation

mxinden
Copy link
Member

@mxinden mxinden commented Sep 29, 2022

Instead of being generic over the metric type on Registry, use dynamic dispatch.

On the upside this significantly simplifies type signatures exposed to the user. On the downside this requires users to always use dynamic dispatch and requires all metrics to implement auxiliary trait Debug and auto traits Send and Sync.

Initially suggested in #82 (comment)

Would replace #99 //CC @ackintosh @adamchalmers

Happy for alternative suggestions. //CC @thomaseizinger in case you are interested and have ideas.

Instead of being generic over the metric type on `Registry`, use dynamic
dispatch.

On the upside this significantly simplifies type signatures exposed to the user.
On the downside this requires users to always use dynamic dispatch and requires
all metrics to implement auxiliary trait `Debug` and auto traits `Send` and
`Sync`.
@adamchalmers
Copy link
Contributor

Personally, I box all my metrics anyway, because I need to use many different types with one registry. So I'm okay with that tradeoff.

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for exploring this!

The problem domain we have here is basically the same as serde's, thus I think the ideal solution might be a visitor pattern with an abstract data model.

That would make the actual Encode trait oblivious about the underlying format, same as serde's Serialize. A registry could then pick a particular Encoder implementation and encode all metrics via double-dispatch.

This being said, serde's design is obviously designed around having a plethora of actual serialization formats whereas here we only have two (I think?) so it might not worth it.

Thoughts?

&mut self,
name: N,
help: H,
metric: impl Metric,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to take a reference here? I think the caller will always have to clone it, otherwise they are not gonna have a reference to it to actually use the metric, meaning we can do the cloning for them, therefore slightly simplifying the API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention taking an owned value instead of a reference was to be explicit about the otherwise internal clone.

I think the caller will always have to clone it, otherwise they are not gonna have a reference to it to actually use the metric

Not in the case of a constant metric, i.e. a metric that never changes. For such a metric the user does not need a handle to it, and can thus pass the only instance to register.

I would stick with the owned signature, though I don't feel strongly about it.

Comment on lines +190 to +191
/// let subsystem_a_counter_1: Counter = Counter::default();
/// let subsystem_a_counter_2: Counter = Counter::default();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These type ascriptions are unfortunate. Can we get rid of them somehow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, this is due to the default value of the Counter A trait parameter.

One thought I had was reducing the number of indirections through a Counter::new method, though unfortunately, the compiler can not infer A in such case either.

Comment on lines +370 to +407
// TODO: Instead of depending on dynamic dispatch for all metrics, we could use
// static dispatch for the most common ones and only fall back to dynamic
// dispatch. That said, this needs to be proven performant in a benchmark first.
/// Super trait representing an abstract Prometheus metric.
#[cfg(not(feature = "protobuf"))]
pub trait Metric:
crate::encoding::text::EncodeMetric + Send + Sync + std::fmt::Debug + 'static
{
}

/// Super trait representing an abstract Prometheus metric.
#[cfg(feature = "protobuf")]
pub trait Metric:
crate::encoding::text::EncodeMetric
+ crate::encoding::proto::EncodeMetric
+ Send
+ Sync
+ std::fmt::Debug
+ 'static
{
}

#[cfg(not(feature = "protobuf"))]
impl<T> Metric for T where
T: crate::encoding::text::EncodeMetric + Send + Sync + std::fmt::Debug + 'static
{
}

#[cfg(feature = "protobuf")]
impl<T> Metric for T where
T: crate::encoding::text::EncodeMetric
+ crate::encoding::proto::EncodeMetric
+ Send
+ Sync
+ std::fmt::Debug
+ 'static
{
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit hacky but I guess it works? :D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Though I am not sure how to do it differently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Features are supposed to be additive, this is not additive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionality-wise for the end user, it is additive. There is more discussion further down in how else we could solve this!

Copy link
Contributor

@nox nox Oct 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I care about only text encoding and my metric type implements only crate::encoding::text::EncodeMetric, and some dependency enables the protobuf feature, then my type won't implement Metric anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will. The one defined on line 382.

There will always be a Metric trait, just with different supertraits basd on the activated feature.

I can't think of a situation in which this breaks something but I agree that it is not clean.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't, that's not how it works. If I implement all the supertraits but crate::encoding::proto::EncodeMetric and I rely on my type implementing Metric, my code will break if a third-party enables the "protobuf" feature, thus making the "protobuf" feature non-additive. Adding a supertrait when a feature is enabled makes that feature non-additive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, yeah. It is technically breaking. But the set of prometheus metrics is fixed right? Why would you implement the trait yourself? What am I missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already implemented my own Family and my own Counter.

Copy link
Member Author

@mxinden mxinden Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this @nox.

I think the way to go is a single EncodeMetric trait which then allows encoding:text and encoding::protobuf to traverse a metric via the visitor pattern as suggested below by @thomaseizinger.

@mxinden
Copy link
Member Author

mxinden commented Oct 2, 2022

The problem domain we have here is basically the same as serde's, thus I think the ideal solution might be a visitor pattern with an abstract data model.

That would make the actual Encode trait oblivious about the underlying format, same as serde's Serialize. A registry could then pick a particular Encoder implementation and encode all metrics via double-dispatch.

This being said, serde's design is obviously designed around having a plethora of actual serialization formats whereas here we only have two (I think?) so it might not worth it.

Thoughts?

The usage of serde is currently explored in #97. If I am not mistaken, this can happen in parallel, right? Serializing (serde) and storing (Registry) are orthogonal.

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Oct 4, 2022

The problem domain we have here is basically the same as serde's, thus I think the ideal solution might be a visitor pattern with an abstract data model.
That would make the actual Encode trait oblivious about the underlying format, same as serde's Serialize. A registry could then pick a particular Encoder implementation and encode all metrics via double-dispatch.
This being said, serde's design is obviously designed around having a plethora of actual serialization formats whereas here we only have two (I think?) so it might not worth it.
Thoughts?

The usage of serde is currently explored in #97. If I am not mistaken, this can happen in parallel, right? Serializing (serde) and storing (Registry) are orthogonal.

You might have misunderstood me :)
I was pointing out the similarity between:

  • Encoding formats in prometheus ("text" and "protobuf") and serde's serializers ("json", "toml", ..)
  • The Encode trait and the Serialize trait

With this PR, we are facing the same problem as serde does: We want to have types (Gauge, Counter, etc) that should all know how to serialize encode themselves but we want them to be oblivious about the format (i.e. text vs protobuf).

We could solve this by copying serde's design of an abstract Serializer Encoder trait. text::Encoder is almost already the right abstraction, we just need to make it a trait and perhaps generalize it so it can do all the things we need to do for the protobuf format.

In other words, instead of having multiple EncodeMetric traits, we should have only one trait and multiple Encoder implementations that the user can choose between. The different metric types are essential our data model so the Encoder trait could look something like:

trait Encoder {
	fn encode_int_counter(&mut self, int_value: u64) -> Result<()>;
	fn encode_double_counter(&mut self, int_value: f64) -> Result<()>;
	fn encode_int_gauge(&mut self, int_value: i64) -> Result<()>;
	fn encode_labels(&mut self, labels: &[String]) -> Result<()>;
	// etc
}

Another design might be something like:

trait Encoder {
	fn start_encode_counter(&mut self) -> Result<CounterEncoder>;
	// etc
}

impl CounterEncoder<'encoder> {
	fn add_labels(&mut self, labels: &[String]);
	fn set_int_value(&mut self, int_value: u64);
	fn set_double_value(&mut self, double_value: f64);
	
	fn finish(self) -> Result<()>
}

impl Encode for Counter {
	fn encode<E>(&mut, encoder: &mut E) where E: Encoder -> Result<()> {
		let mut counter_encoder = encoder.start_encoder_counter();

		counter_encoder.add_labels(&["foo"]).set_int_value(self.value).finish()
	}
}

In regards to #97, I don't think an approach that is fully generic over serde will work because not every type that can be Serialized is a valid prometheus metric. The data models do not match.

@08d2
Copy link

08d2 commented Oct 4, 2022

I agree with @thomaseizinger that using serde directly is probably not the right approach.

Prometheus defines a set of types (counter, gauge, histogram, summary) which can be encoded. Core Prometheus defines one encoding format (text) and OpenMetrics another (binary) over those core types. No other valid encodings exist; there is no JSON encoding of a Prometheus counter, for example.

The transformation between a Prometheus type and its encoded form is code defined by the format, and "custom" for the type. For example, a Rust type implementing a Prometheus gauge may have an OpenMetrics encoding which is different than its general Protobuf encoding, and the presence of a Protobuf encoding on a type doesn't mean that encoding is valid for OpenMetrics. Likewise, a type implementing a Prometheus counter may have a Prometheus text encoding which is different than its general text encoding, and the presence of a text encoding on a type doesn't mean that encoding is valid for Prometheus.

@mxinden
Copy link
Member Author

mxinden commented Oct 9, 2022

👍 for not adopting serde. Thanks for walking me through this. Appreciate the elaborate post above.

I like the idea of a single EncodeMetric trait which is then used by both encoding::text and encoding::protobuf.

I think this is orthogonal to this pull request. I.e. this pull request is about fixing Registry to use dynamic dispatching and thus dropping the trait parameter M.

I suggest we keep this pull request as is and in a follow-up pull request consolidate the two EncodeMetric traits into one.

Objections?

@thomaseizinger
Copy link
Contributor

👍 for not adopting serde. Thanks for walking me through this. Appreciate the elaborate post above.

I like the idea of a single EncodeMetric trait which is then used by both encoding::text and encoding::protobuf.

I think this is orthogonal to this pull request. I.e. this pull request is about fixing Registry to use dynamic dispatching and thus dropping the trait parameter M.

I suggest we keep this pull request as is and in a follow-up pull request consolidate the two EncodeMetric traits into one.

Objections?

Fine with me.

I just thought a bit more about how we can adopt this design and we will have to take inspiration from https://github.com/dtolnay/miniserde as the static dispatch approach from serde does not work with trait objects as we want it where the registry doesn't know about the concrete metrics.

With saying that, could we get away with having an enum of all metrics and store that without boxing? :)

@mxinden
Copy link
Member Author

mxinden commented Oct 9, 2022

With saying that, could we get away with having an enum of all metrics and store that without boxing? :)

Unfortunately not. Or I don't know how to do that. The reason being:

  • Counter, Gauge, ... are generic over the underlying AtomicXXX, e.g. Counter<AtomicU64>.
  • Family<M>, where M can be any of Counter, Gauge, ...
  • A registry may store one or more of any permutation of the above, thus this results in a very high cardinality.

We could drop the generic parameters here as well, and only provide a limited set of AtomicXXX. Thus far I have not considered that worth the complexity.

@thomaseizinger
Copy link
Contributor

With saying that, could we get away with having an enum of all metrics and store that without boxing? :)

Unfortunately not. Or I don't know how to do that. The reason being:

* `Counter`, `Gauge`, ... are generic over the underlying `AtomicXXX`, e.g. `Counter<AtomicU64>`.

* `Family<M>`, where `M` can be any of `Counter`, `Gauge`, ...

* A registry may store one or more of any permutation of the above, thus this results in a very high cardinality.

So a Family could just contain the enum again, right? It is a recursive structure like serde_json::Value.

Do we need to be generic over all atomics?

@mxinden
Copy link
Member Author

mxinden commented Oct 13, 2022

So a Family could just contain the enum again, right? It is a recursive structure like serde_json::Value.

Correct.

Do we need to be generic over all atomics?

Maybe. I am undecided on it.

Also Family is not only generic over the metric M, but also over the label set S and the constructor C. We would need to replace these with trait objects. I am not sure that would be a good idea.

@thomaseizinger
Copy link
Contributor

So a Family could just contain the enum again, right? It is a recursive structure like serde_json::Value.

Correct.

Do we need to be generic over all atomics?

Maybe. I am undecided on it.

Also Family is not only generic over the metric M, but also over the label set S and the constructor C. We would need to replace these with trait objects. I am not sure that would be a good idea.

Fair enough. Yeah it would just be another optimization. We should be fine if we box the entire metric up.

@mxinden
Copy link
Member Author

mxinden commented Oct 15, 2022

impl Encode for Counter {
	fn encode<E>(&mut, encoder: &mut E) where E: Encoder -> Result<()> {
		let mut counter_encoder = encoder.start_encoder_counter();

		counter_encoder.add_labels(&["foo"]).set_int_value(self.value).finish()
	}
}

Following up on above's suggestion in #100 (comment).

The Encode trait as described above is not trait object safe. Thus we could not store a Box<dyn Encode> in a Registry.

@mxinden
Copy link
Member Author

mxinden commented Oct 15, 2022

While I do see value in users being able to bring their own metrics, I don't think users will bring their own output format. Thus we can likely get around the above trait object safety issue by multiplexing output formats via an enum. I am working on a proof-of-concept. More to come.

@thomaseizinger
Copy link
Contributor

While I do see value in users being able to bring their own metrics, I don't think users will bring their own output format. Thus we can likely get around the above trait object safety issue by multiplexing output formats via an enum. I am working on a proof-of-concept. More to come.

Instead of an enum, &mut dyn Encoder might work too? The issue I see with an enum is that it might again be hard to feature-flag the protobuf implementation. We would have to wrap the enum and make Encoder an opaque struct to make sure users don't depend on the variants.

@mxinden
Copy link
Member Author

mxinden commented Oct 16, 2022

Instead of an enum, &mut dyn Encoder might work too?

In that case none of the methods on Encoder are allowed to have generic type parameters. Though we want the Encoder to be able to encode some generic type that implements EncodeLabelSet. See error below.

I have not yet found a way around this issue.

error[E0038]: the trait `encoding::MetricEncoder` cannot be made into an object
  --> src/encoding.rs:29:36
   |
29 |     fn encode(&self, encoder: &mut dyn MetricEncoder) -> Result<(), std::fmt::Error> {
   |                                    ^^^^^^^^^^^^^^^^^ `encoding::MetricEncoder` cannot be made into an object
   |
note: for a trait to be "object safe" it needs to allow building a vtable to allow the call to be resolvable dynamically; for more information visit <https://doc.rust-lang.org/reference/items/traits.html#object-safety>
  --> src/encoding.rs:45:8
   |
39 | pub trait MetricEncoder<'a> {
   |           ------------- this trait cannot be made into an object...
...
45 |     fn with_label_set<'b, S: EncodeLabelSet>(
   |        ^^^^^^^^^^^^^^ ...because method `with_label_set` has generic type parameters
   = help: consider moving `with_label_set` to another trait

The issue I see with an enum is that it might again be hard to feature-flag the protobuf implementation. We would have to wrap the enum and make Encoder an opaque struct to make sure users don't depend on the variants.

True. Though that might be worth it.

@mxinden
Copy link
Member Author

mxinden commented Oct 16, 2022

#105 implements the above suggested serde style encoding using two traits, EncodeMetric (see serde's Serialize) and MetricEncoder (see serde's Serializer).

This addresses the issue raised by @nox, namely the conflict with Rust's additive features.

Feedback is more than welcome.

@08d2
Copy link

08d2 commented Oct 19, 2022

While I do see value in users being able to bring their own metrics

I'm probably missing something here, but this doesn't (immediately) make sense to me. Prometheus metrics are types, not traits. A user type wouldn't implement a counter, it would update a counter: has-a, rather than is-a. Right?

@thomaseizinger
Copy link
Contributor

While I do see value in users being able to bring their own metrics

I'm probably missing something here, but this doesn't (immediately) make sense to me. Prometheus metrics are types, not traits. A user type wouldn't implement a counter, it would update a counter: has-a, rather than is-a. Right?

I don't quite understand what you are getting at. Every metric needs to implement EncodeMetric in the suggested solution. How is that related to has-a vs is-a?

@nox
Copy link
Contributor

nox commented Oct 19, 2022

I'm probably missing something here, but this doesn't (immediately) make sense to me. Prometheus metrics are types, not traits. A user type wouldn't implement a counter, it would update a counter: has-a, rather than is-a. Right?

As mentioned in this previous comment, I've already implemented my own counter and my own family. That they wrap types from this crate is an implementation detail, the important part is that my custom types implement EncodeMetric without delegating to the prometheus-client types they wrap.

@08d2
Copy link

08d2 commented Oct 21, 2022

@thomaseizinger

I'm probably missing something here, but this doesn't (immediately) make sense to me. Prometheus metrics are types, not traits. A user type wouldn't implement a counter, it would update a counter: has-a, rather than is-a. Right?

I don't quite understand what you are getting at. Every metric needs to implement EncodeMetric in the suggested solution. How is that related to has-a vs is-a?

Ah, apologies for being vague. Let me try again: if you're writing a Prometheus client library (i.e. this crate) then the idea is that you're providing "every metric" yourself, each as a specific concrete type. Metrics are precisely specified, and their behavior isn't (or, shouldn't be) up to interpretation by implementations. Users shouldn't be bringing their own types. They should be using yours.

@nox

As mentioned in #100 (comment), I've already implemented my own counter and my own family. That they wrap types from this crate is an implementation detail, the important part is that my custom types implement EncodeMetric without delegating to the prometheus-client types they wrap.

Your counter implementation violates the Prometheus spec, I think, right? So it isn't really a counter, even though it may satisfy the relevant interface in this client library. This is an example of the kind of outcome that I claim should be specifically prevented.

YMMV 🤷

@nox
Copy link
Contributor

nox commented Oct 21, 2022

Your counter implementation violates the Prometheus spec, I think, right?

The OpenMetrics spec, sure. Other prometheus clients so far don't care about suffixes.

If the ability to make my own unsuffixed counter goes away, I'll simply not use this crate anymore.

@thomaseizinger
Copy link
Contributor

@thomaseizinger

I'm probably missing something here, but this doesn't (immediately) make sense to me. Prometheus metrics are types, not traits. A user type wouldn't implement a counter, it would update a counter: has-a, rather than is-a. Right?

I don't quite understand what you are getting at. Every metric needs to implement EncodeMetric in the suggested solution. How is that related to has-a vs is-a?

Ah, apologies for being vague. Let me try again: if you're writing a Prometheus client library (i.e. this crate) then the idea is that you're providing "every metric" yourself, each as a specific concrete type. Metrics are precisely specified, and their behavior isn't (or, shouldn't) be up to interpretation by implementations. Users shouldn't be bringing their own types. They should be using yours.

Yes, I agree with you. The way I see it is that "bring your own metric" is likely to be possible but I don't think it needs to be a goal of the library.

@08d2
Copy link

08d2 commented Oct 22, 2022

Your counter implementation violates the Prometheus spec, I think, right?

The OpenMetrics spec, sure. Other prometheus clients so far don't care about suffixes.

The docs on naming state that

an accumulating count has total as a suffix, in addition to the unit if applicable

https://prometheus.io/docs/practices/naming/#metric-names

If the ability to make my own unsuffixed counter goes away, I'll simply not use this crate anymore.

This is at least superficially confusing. Why? A counter in the context of Prometheus is, by definition, suffixed as described above. There's no, like, law, or parser validator, that rejects counters that don't meet the specification, I guess. But it's still well-defined, and there's not really any room for interpretation. At least, as far as I understand the docs.

@nox
Copy link
Contributor

nox commented Oct 22, 2022

Your counter implementation violates the Prometheus spec, I think, right?

The OpenMetrics spec, sure. Other prometheus clients so far don't care about suffixes.

The docs on naming state that

an accumulating count has total as a suffix, in addition to the unit if applicable

https://prometheus.io/docs/practices/naming/#metric-names

Literally not a spec, this document starts with:

The metric and label conventions presented in this document are not required for using Prometheus, but can serve as both a style-guide and a collection of best practices. Individual organizations may want to approach some of these practices, e.g. naming conventions, differently.

(Emphasis mine)

@08d2
Copy link

08d2 commented Oct 23, 2022

Literally not a spec, this document starts with:

The metric and label conventions presented in this document are not required for using Prometheus, but can serve as both a style-guide and a collection of best practices. Individual organizations may want to approach some of these practices, e.g. naming conventions, differently.

(Emphasis mine)

As this client library exists in the Prometheus org, I assume it will abide and enforce Prometheus best practices. If that's not the case, then I suppose it should live somewhere else. But whether spec or suggestion, MUST or SHOULD, the things under discussion here, suffixing certainly included, represent no meaningful hardship to users, and carry no cost versus alternatives. Absent specific cause, there's no reason not to play by the rules.

@nox
Copy link
Contributor

nox commented Oct 23, 2022

But whether spec or suggestion, MUST or SHOULD, the things under discussion here, suffixing certainly included, represent no meaningful hardship to users, and carry no cost versus alternatives. Absent specific cause, there's no reason not to play by the rules.

I've already explained that we have dozens of dozens of dashboards using unsuffixed metrics and forking this crate will be cheaper than migrating those dashboards.

@mxinden
Copy link
Member Author

mxinden commented Dec 14, 2022

I am closing here. The goal of this pull request has been achieved through #105 and released as an alpha via #116.

Thanks for the input everyone. Would appreciate folks testing the new alpha and leaving feedback on #116 or as separate GitHub issues.

@mxinden mxinden closed this Dec 14, 2022
@nox
Copy link
Contributor

nox commented Sep 4, 2023

I've already explained that we have dozens of dozens of dashboards using unsuffixed metrics and forking this crate will be cheaper than migrating those dashboards.

You completely removed the ability to define custom metrics to avoid the _total suffix on counters, that's really unfortunate.

@mxinden
Copy link
Member Author

mxinden commented Sep 4, 2023

I am sorry for the trouble @nox. I decided to follow the OpenMetrics specification. In the long run I hope for the ecosystem (e.g. your dashboards) to converge on that specification. In the meantime, I don't see a way around forking.

@nox
Copy link
Contributor

nox commented Sep 4, 2023

I am sorry for the trouble @nox. I decided to follow the OpenMetrics specification. In the long run I hope for the ecosystem (e.g. your dashboards) to converge on that specification. In the meantime, I don't see a way around forking.

Thanks, we will stay on prometheus-client 0.18 for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants