src/collector: Introduce Collector abstraction #82

mxinden · 2022-08-29T04:45:42Z

The Collector abstraction allows users to provide additional metrics
and their description on each scrape.

See also:

I am opening this up as a "Draft" for early feedback.

mxinden

What do folks think of this design? Would this enable your use-cases?

//CC @gagbo @dovreshef @vladvasiliu @phyber

Alternative implementation to #82.

mxinden · 2022-08-29T04:46:13Z

src/collector.rs

+pub trait Collector<M: Clone> {
+    fn collect<'a>(&'a self) -> Box<dyn Iterator<Item = (Cow<Descriptor>, Cow<M>)> + 'a>;
+}


This is the trait one would need to implement to provide a custom collector, e.g. a process collector.

I see this trait is generic over M, which I think is the type of the metric? How does the Clone bound work with collectors/registries of type Box<dyn EncodeMetric> which aren't Clone? 🤔

How does the Clone bound work with collectors/registries of type Box<dyn EncodeMetric> which aren't Clone? thinking

It doesn't. For now this is just a proposal. We could require Clone on EnocdeMetric, though I think the much better idea is to no longer be generic over the metric type. See #82 (comment).

Let me know what you think @sd2k!

Ah great, that makes sense. I agree that switching to an enum instead of generics/trait objects would simplify things a lot! (Especially since adding a Clone bound to EncodeMetric would mean it wasn't object safe, so I don't think we could use trait objects anyway).

(Especially since adding a Clone bound to EncodeMetric would mean it wasn't object safe, so I don't think we could use trait objects anyway).

Good point. We would have to drop the Cow<M> in favor of M, but then again, let's try to get rid of M.

mxinden · 2022-08-29T04:46:34Z

src/registry.rs

+    /// struct MyCollector {}
+    ///
+    /// impl Collector<Counter> for MyCollector {
+    ///   fn collect<'a>(&'a self) -> Box<dyn Iterator<Item = (Cow<Descriptor>, Cow<Counter>)> + 'a> {
+    ///     let c = Counter::default();
+    ///     let descriptor = Descriptor::new(
+    ///       "my_counter",
+    ///       "This is my counter",
+    ///       None,
+    ///       None,
+    ///       vec![],
+    ///     );
+    ///     Box::new(std::iter::once((Cow::Owned(descriptor), Cow::Owned(c))))
+    ///   }
+    /// }
+    ///
+    /// let my_collector = Box::new(MyCollector{});
+    ///
+    /// let mut registry: Registry<Counter> = Registry::default();
+    ///
+    /// registry.register_collector(my_collector);


Here is an example on how to implement and register a custom collector.

mxinden · 2022-08-29T04:50:46Z

src/registry.rs

@@ -57,26 +59,37 @@ use std::borrow::Cow;
 /// #                "# EOF\n";
 /// # assert_eq!(expected, String::from_utf8(buffer).unwrap());
 /// ```
-#[derive(Debug)]
 pub struct Registry<M = Box<dyn crate::encoding::text::SendSyncEncodeMetric>> {


With the Collector mechanism I don't think there is need for being abstract over M any longer. Using a concrete type (e.g. enum over all metrics in prometheus_client::metrics) will drastically simplify both the library and its usage.

See #100 proposing the above.

mxinden · 2022-08-29T04:51:51Z

Also //CC @sd2k

mxinden · 2022-08-29T04:52:24Z

And //CC @08d2

dovreshef · 2022-08-29T05:13:52Z

Looks like it will work for the process collector use case! Thanks.

vladvasiliu · 2022-08-29T08:09:13Z

Am I understanding correctly that the user of the lib is expected to reimplement the Collector trait such that calling collect() will do whatever is necessary to produce the metrics, possibly querying some external system?

If so, I think there should be a way to provide an async collector; this will probably require implementing some alternative AsyncRegistry with matching AsyncCollector trait.

08d2 · 2022-08-29T19:10:16Z

Am I understanding correctly that the user of the lib is expected to reimplement the Collector trait such that calling collect() will do whatever is necessary to produce the metrics, possibly querying some external system?

If so, I think there should be a way to provide an async collector; this will probably require implementing some alternative AsyncRegistry with matching AsyncCollector trait.

For context, collect() is called when Prometheus does an HTTP GET of the metrics endpoint for an instance of a job.

That call is subject to a few layers of (typically sub-second) timeouts and failsafes, and is expected to return more or less immediately. An implementation of collect() which blocked in any way that would benefit from async would almost certainly be a design error.

If the information returned by collect() needed to come from an external source, then it's I guess a de facto requirement that the retrieval is decoupled from the collecting.

vladvasiliu · 2022-08-29T19:26:51Z

That's what I was thinking, although this example seems to be reading stuff (memory info) from an external system (though, in this case, it's probably unlikely that it should take long).

I have to think a bit more about this; it seems to me that once the collector is added to the registry, there's no easy way to only update it such that it doesn't share state between possibly concurrent calls to the /metrics endpoint, all the while updating said state outside collect().

08d2 · 2022-08-29T19:46:57Z

although this example seems to be reading stuff (memory info) from an external system

callLongGetter? Yeah, it's not obvious to me what that does, exactly... but I wouldn't take too much inspiration from the Java client, which hasn't been updated in a long while, and was never particularly idiomatic. Stick with the Go client, or maybe look at the more active exporters hosted in the Prometheus org.

there's no easy way to only update it such that it doesn't share state between possibly concurrent calls to the /metrics endpoint, all the while updating said state outside collect().

The nature of the problem demands shared state, for sure. (Or a "clever" workaround to avoid it, which, well.) But does shared state imply async, either necessarily or even ideally? I don't think so, but I could be wrong.

vladvasiliu · 2022-08-30T08:36:08Z

callLongGetter? Yeah, it's not obvious to me what that does, exactly... but I wouldn't take too much inspiration from the Java client, which hasn't been updated in a long while, and was never particularly idiomatic. Stick with the Go client, or maybe look at the more active exporters hosted in the Prometheus org.

I was thinking more about collectMemoryMetricsLinux. The reason I gave this example was because it's the one mentioned on the docs.

The nature of the problem demands shared state, for sure. (Or a "clever" workaround to avoid it, which, well.) But does shared state imply async, either necessarily or even ideally? I don't think so, but I could be wrong.

I wouldn't say shared state implies async. I think the issue was the way I was thinking about somehow getting the external (non-shared) state as a collector of the shared state, which would have then required collect() to cover async collection.

This could probably be implemented the other way around, with the shared state added as a sub-collector to the non-shared one before returning. Which, if collect() is guaranteed to return quickly, could be called from an async context.

vladvasiliu · 2022-08-30T14:17:19Z

I've been looking a bit more over this, and I think that, basically, a Registry is a Collector: it holds a bunch of metrics, right? That's how the Python client is implemented.

I think that if the lib provided an implementation of Collector for Registry, this would solve the majority of use cases.

mxinden · 2022-08-31T06:56:05Z

Am I understanding correctly that the user of the lib is expected to reimplement the Collector trait such that calling collect() will do whatever is necessary to produce the metrics, possibly querying some external system?

Correct.

If so, I think there should be a way to provide an async collector; this will probably require implementing some alternative AsyncRegistry with matching AsyncCollector trait.

Can you be more concrete? Can you describe a use-case where one would need an async collector?

That call is subject to a few layers of (typically sub-second) timeouts and failsafes, and is expected to return more or less immediately.

Prometheus does not expect the scrape call to return immediately. E.g. from the Prometheus scheduling section:

The default scrape timeout for Prometheus is 10 seconds.

If the information returned by collect() needed to come from an external source, then it's I guess a de facto requirement that the retrieval is decoupled from the collecting.

By default the collection should not be decoupled from the collection. Citing the Prometheus scheduling section:

Metrics should only be pulled from the application when Prometheus scrapes them, exporters should not perform scrapes based on their own timers. That is, all scrapes should be synchronous.

I think that if the lib provided an implementation of Collector for Registry, this would solve the majority of use cases.

Registry could implement Collector, though I have refrained from that thus far to reduce complexity, i.e. not to mix concepts. I expect a user to have a single Registry but many `Collectors.

I don't follow how Registry implementing Collector would solve an issue here. Would you mind expanding on that?

mxinden · 2022-08-31T06:58:00Z

Also note that the encode function takes an immutable reference to a Registry, thus two Prometheus servers (e.g. for fault tolerance) may scrape the same endpoint concurrently.

vladvasiliu · 2022-08-31T08:41:48Z

Can you be more concrete? Can you describe a use-case where one would need an async collector?

My use case is exporting metrics that are gathered via multiple HTTP calls. If my HTTP client is async, collect() should be able to handle that.

Prometheus does not expect the scrape call to return immediately. E.g. from the Prometheus scheduling section:

Well, the question is whether collect() itself should return immediately. The other operations could be done outside it, while staying inside the global scrape.

I don't follow how Registry implementing Collector would solve an issue here. Would you mind expanding on that?

See the other point: this is a confusion about whether collect() should be guaranteed to be quick or not and whether a collector should be different from a registry.

If it's not, and random long operations can be done inside collect(), then I can see how a Collector can be different from a Registry, and thus providing an default implementation for Registry doesn't help, since every collector is expected to be different and do its own thing.

However, in the second situation, when the actual gathering of the metrics should be done outside of the collect() method, then basically this becomes generating two separate registries, and combining them before calling collect(). The idea wouldn't be to do the scraping outside of the call from prometheus, but only outside the call to collect().

vladvasiliu · 2022-08-31T08:55:02Z

Looking at the Go implementation, specifically the example collector, it's clear that the collect() method is not expected to return "immediately".

This means that there's no particular reason to provide an implementation of Collector for Registry, but it can be useful to provide an alternative, async Collector, in case the actual job of collection is done asynchronously.

08d2 · 2022-09-02T05:13:46Z

That example describes how a collector might be "bolted on" to a legacy system. It's not indicative of best practice. But the point is largely moot, I think.

kwiesmueller · 2022-09-02T06:46:59Z

My current examples why a collector would need to support async are:

collecting metrics from devices via e.g. Bluetooth
collecting container metrics comparable to cadvisor where the list of metrics is basically generated from a current state that is retrieved from the filesystem, containerd APIs etc.

While both approaches could also refresh the data in the background, having it directly connected to the scrape and being able to fetch fresh data on a scrape provides more reliable resolution than scraping previously scraped data.

Now, the data refresh could actually be handled in the http handler, but still it seems like an unnecessary detour.

mxinden · 2022-09-04T07:11:42Z

As a first iteration, I suggest we support synchronous Collector::collect only.

For those that need to call async methods in their Collector::collect call, I suggest you spawn the encode call in a new OS thread (or in a blocking-aware runtime task) and call any async methods via block_on in your Collector::collect implementation.
For those that need concurrency within your Collector::collect implementation (e.g. reaching out to two remote machines concurrently) I suggest you spawn two or more OS threads within your Collector::collect implementation and join the threads again.
For those that need the Collector::collect calls of two distinct Collector implementations to run concurrently, you would be out of luck for now.

In case folks see a large performance hit, e.g. due to thread spawns (even though that should be < 10 microseconds), or (1) and (2) add too much complexity in your Collector::collect implementation or folks need (3) I suggest we design an async Collector::collect as a next step.

mxinden · 2022-09-04T07:16:54Z

@kwiesmueller Thanks for sharing. Let me know in case the threading approach is good enough for you for now, i.e. can collect from the many datasources you are targeting within the Prometheus scrape timeout.

If not, as mentioned above, let's design an async Collector::collect as a next step.

mxinden · 2022-10-02T11:25:26Z

I rebased this pull request onto #100. Everything compiles, tests succeed.

To me the above validates that we should move forward with #100. That said, I do think the signatures in this pull request need more work (e.g. the workaround with MaybeOwned).

Adopt encoding style similar to serde. `EncodeXXX` for a type that can be encoded (see serde's `Serialize`) and `XXXEncoder` for a supported encoding i.e. text and protobuf (see serde's `Serializer`). - Compatible with Rust's additive features. Enabling the `protobuf` feature does not change type signatures. - `EncodeMetric` is trait object safe, and thus `Registry` can use dynamic dispatch. - Implement a single encoding trait `EncodeMetric` per metric type instead of one implementation per metric AND per encoding format. - Leverage `std::fmt::Write` instead of `std::io::Write`. The OpenMetrics text format has to be unicode compliant. The OpenMetrics Protobuf format requires labels to be valid strings. Signed-off-by: Max Inden <[email protected]>

hexfusion · 2022-12-03T21:45:36Z

Very interested in this work are you looking for help or what would help move this forward.

mxinden · 2022-12-06T12:32:50Z

Very interested in this work

Thanks for the interest. Sorry for the silence.

I rebased this pull request on #105. I want to try out the new API with one of my projects (e.g. https://github.com/mxinden/kademlia-exporter/) before merging #105 and this pull request.

are you looking for help or what would help move this forward.

I would appreciate feedback on #105 and this pull request. You would be of great help by testing this pull request (it is on top of #105) within your application in need of the Collector abstraction @hexfusion.

hexfusion · 2022-12-06T12:43:02Z

I have some cycles today will take a look, thanks!

Signed-off-by: Max Inden <[email protected]>

The `Collector` abstraction allows users to provide additional metrics and their description on each scrape. See also: - https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#hdr-Custom_Collectors_and_constant_Metrics - prometheus#49 - prometheus#29 Signed-off-by: Max Inden <[email protected]>

Signed-off-by: Max Inden <[email protected]>

Iterators returned by a Collector don't have to cross thread boundaries, nor doe their references.

mxinden · 2022-12-18T20:35:20Z

Tested this patch with rust-libp2p and kademlia-exporter. While the Collector trait is still quite verbose (with Cow and MaybeOwned), it does function as expected and allows ad-hoc cheap metric generation.

mxinden/kademlia-exporter#209

Signed-off-by: Max Inden <[email protected]>

…to collector

Signed-off-by: Max Inden <[email protected]>

thomaseizinger · 2023-04-05T08:20:10Z

src/registry.rs

@@ -228,41 +258,58 @@ impl Registry {
            .expect("sub_registries not to be empty.")
    }

-    /// [`Iterator`] over all metrics registered with the [`Registry`].
-    pub fn iter(&self) -> RegistryIterator {


Renaming RegistryIterator and making this pub(crate) is the only breaking change as far as I can tell. Might be worth undoing.

thomaseizinger · 2023-04-05T08:36:36Z

src/metrics/family.rs

+impl<S: EncodeLabelSet, M: EncodeMetric + TypedMetric, T: Iterator<Item = (S, M)>> EncodeMetric
+    for RefCell<T>


This is very hard to discover and doesn't feel Rust-y to me. Why do we need to depend on RefCell here?

I think it would be more idiomatic to do:

impl EncodeMetric for Vec<T> where T: EncodeMetric { }

impl EncodeMetric for (S, M) where S: EncodeLabelSet, M: EncodeMetric { }

Then, have the user provide a Vec instead of being generic over something that can be iterated. That should get rid of the RefCell and iterator dependency and would compose better with other usecases.

An Iterator::next takes &mut self in order to be able to e.g. track its position within a Vec. Unfortunately EncodeMetric::encode takes &self.

In this pull request RefCell is used for interior mutability. It allows calling an Iterator::next taking &mut self from an EncodeMetric::encode providing &self only.

An alternative approach would be to change EncodeMetric::encode to take &mut self. I deemed the change and its implications not worth it, given that Collector is an abstraction for advanced users only.

All that said, I agree that the usage of RefCell is neither simple nor intuitive. @thomaseizinger do you see other alternatives to the one above?

mxinden commented Aug 29, 2022

View reviewed changes

mxinden mentioned this pull request Aug 29, 2022

Collector pattern #74

Closed

This was referenced Aug 29, 2022

Custom collector for multiple metrics #49

Open

Const Metrics / Custom Collectors #36

Open

Implementing a process collector #29

Open

mxinden mentioned this pull request Sep 2, 2022

exporting metrics lists without registering them #86

Open

mxinden force-pushed the collector branch from 7978ab5 to 589f7b2 Compare September 4, 2022 07:25

This was referenced Sep 27, 2022

protocols/gossipsub: Enable the protobuf feature of prometheus-client libp2p/rust-libp2p#2911

Closed

src/registry: Use dynamic dispatch and remove type parameter M #100

Closed

mxinden force-pushed the collector branch from 589f7b2 to afe9a3f Compare October 2, 2022 11:22

mxinden force-pushed the collector branch from afe9a3f to 87e6d39 Compare December 6, 2022 12:30

mxinden mentioned this pull request Dec 7, 2022

chore(metrics): Upgrade to prometheus-client v0.19.0 libp2p/rust-libp2p#3207

Merged

4 tasks

mxinden added 4 commits December 10, 2022 09:43

src/encoding: Generate EncodeLabelValue for integers

4b2d00f

Signed-off-by: Max Inden <[email protected]>

src/encoding: Take reference for value

d11c977

Signed-off-by: Max Inden <[email protected]>

src/metrics: Introduce const types

c9f59eb

Signed-off-by: Max Inden <[email protected]>

mxinden force-pushed the collector branch from 87e6d39 to c9f59eb Compare December 10, 2022 10:57

mxinden mentioned this pull request Dec 10, 2022

encoding/: Adopt serde style encoding #105

Merged

mxinden added 2 commits December 17, 2022 18:17

src/collector: Don't require Send and Sync

723f86f

Iterators returned by a Collector don't have to cross thread boundaries, nor doe their references.

src/metrics/family: Implement EncodeMetric for RefCell Iterator

83a2cb6

mxinden mentioned this pull request Dec 19, 2022

misc/metrics: Expose bytes send/received per protocol libp2p/rust-libp2p#3262

Open

mxinden marked this pull request as ready for review December 22, 2022 19:45

mxinden added 7 commits December 29, 2022 15:36

Merge remote-tracking branch 'prometheus/master' into collector

5f70f0c

*: Bump version and add changelog entry

d90c797

Signed-off-by: Max Inden <[email protected]>

Rename collector::Metric to LocalMetric similar to futures box_local

50ce9c9

Signed-off-by: Max Inden <[email protected]>

Merge branch 'master' of https://github.com/prometheus/client_rust in…

3b3aef2

…to collector

Fix intra doc links

5032236

Signed-off-by: Max Inden <[email protected]>

Fix clippy warnings

66f2982

Signed-off-by: Max Inden <[email protected]>

Fix collector doc test

ff36682

Signed-off-by: Max Inden <[email protected]>

mxinden merged commit c619ad5 into prometheus:master Dec 29, 2022

This was referenced Jan 15, 2023

feat(metrics)!: expose identify metrics for connected peers only libp2p/rust-libp2p#3325

Merged

Allow to optionally specify timestamps #126

Open

thomaseizinger reviewed Apr 5, 2023

View reviewed changes

mxinden mentioned this pull request Jun 4, 2023

refactor(collector): have Registry:encode do the encoding #149

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src/collector: Introduce Collector abstraction #82

src/collector: Introduce Collector abstraction #82

mxinden commented Aug 29, 2022

mxinden left a comment

mxinden Aug 29, 2022

sd2k Aug 30, 2022

mxinden Aug 31, 2022

sd2k Aug 31, 2022

mxinden Aug 31, 2022

mxinden Aug 29, 2022 •

edited

Loading

mxinden Aug 29, 2022

mxinden Oct 2, 2022

mxinden commented Aug 29, 2022

mxinden commented Aug 29, 2022

dovreshef commented Aug 29, 2022

vladvasiliu commented Aug 29, 2022

08d2 commented Aug 29, 2022 •

edited

Loading

vladvasiliu commented Aug 29, 2022

08d2 commented Aug 29, 2022 •

edited

Loading

vladvasiliu commented Aug 30, 2022

vladvasiliu commented Aug 30, 2022

mxinden commented Aug 31, 2022

mxinden commented Aug 31, 2022 •

edited

Loading

vladvasiliu commented Aug 31, 2022

vladvasiliu commented Aug 31, 2022

08d2 commented Sep 2, 2022

kwiesmueller commented Sep 2, 2022

mxinden commented Sep 4, 2022

mxinden commented Sep 4, 2022

mxinden commented Oct 2, 2022

hexfusion commented Dec 3, 2022

mxinden commented Dec 6, 2022

hexfusion commented Dec 6, 2022

mxinden commented Dec 18, 2022

thomaseizinger Apr 5, 2023

thomaseizinger Apr 5, 2023

mxinden Apr 9, 2023

		impl<S: EncodeLabelSet, M: EncodeMetric + TypedMetric, T: Iterator<Item = (S, M)>> EncodeMetric
		for RefCell<T>

src/collector: Introduce Collector abstraction #82

src/collector: Introduce Collector abstraction #82

Conversation

mxinden commented Aug 29, 2022

mxinden left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mxinden Aug 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mxinden commented Aug 29, 2022

mxinden commented Aug 29, 2022

dovreshef commented Aug 29, 2022

vladvasiliu commented Aug 29, 2022

08d2 commented Aug 29, 2022 • edited Loading

vladvasiliu commented Aug 29, 2022

08d2 commented Aug 29, 2022 • edited Loading

vladvasiliu commented Aug 30, 2022

vladvasiliu commented Aug 30, 2022

mxinden commented Aug 31, 2022

mxinden commented Aug 31, 2022 • edited Loading

vladvasiliu commented Aug 31, 2022

vladvasiliu commented Aug 31, 2022

08d2 commented Sep 2, 2022

kwiesmueller commented Sep 2, 2022

mxinden commented Sep 4, 2022

mxinden commented Sep 4, 2022

mxinden commented Oct 2, 2022

hexfusion commented Dec 3, 2022

mxinden commented Dec 6, 2022

hexfusion commented Dec 6, 2022

mxinden commented Dec 18, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mxinden Aug 29, 2022 •

edited

Loading

08d2 commented Aug 29, 2022 •

edited

Loading

08d2 commented Aug 29, 2022 •

edited

Loading

mxinden commented Aug 31, 2022 •

edited

Loading