Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bitswap: peer prom tacker #413

Closed
wants to merge 2 commits into from
Closed

bitswap: peer prom tacker #413

wants to merge 2 commits into from

Conversation

Jorropo
Copy link
Contributor

@Jorropo Jorropo commented Jul 20, 2023

Fixes #209

  • Peer Prom Tracker example
  • Tracer tests, I think the client's tracer is bugged and it does not record outbound messages
  • Changelog

Jorropo added 2 commits July 20, 2023 16:51
Need tests (I think the client's tracer is bugged and it does not record outbound messages).
@Jorropo Jorropo requested a review from a team as a code owner July 20, 2023 14:53
@codecov
Copy link

codecov bot commented Jul 20, 2023

Codecov Report

Merging #413 (b2cef5d) into main (cfad09d) will decrease coverage by 0.21%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #413      +/-   ##
==========================================
- Coverage   49.61%   49.41%   -0.21%     
==========================================
  Files         248      249       +1     
  Lines       29838    29945     +107     
==========================================
- Hits        14805    14798       -7     
- Misses      13615    13726     +111     
- Partials     1418     1421       +3     
Impacted Files Coverage Δ
bitswap/bitswap.go 37.20% <0.00%> (ø)
bitswap/client/client.go 82.21% <0.00%> (-2.38%) ⬇️
bitswap/options.go 0.00% <0.00%> (ø)
bitswap/peer-prom-tracker/peer-prom-tracker.go 0.00% <0.00%> (ø)
bitswap/server/server.go 52.19% <0.00%> (ø)

... and 9 files with indirect coverage changes

}
}()

peerIdLabel := []string{"peer-id"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ This will create a separate time series per PeerID, which is ok for debugging, but should NOT be enabled by default.

High cardinality labels in prometheus are is considered antipattern. If there was 20K of peers, this will create 20K time series, and that may cause problems (performance, billing) when Grafana tries to visualize it.

To understand why high cardinality is a problem, see:

IMO this PR can't land in boxo in this form as it creates footgun for users of this library.

There needs to be either a hard-limit on the number of peers tracked, or an explicit opt-in via constructor option or ENV variable.

Copy link
Member

@lidel lidel Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 I think if we wanted to have metrics similar to this, we could measure P95 globally without running into the cardinality problem.

To do so, one would define Objectives in SummaryOpts to be P50, P75, P95 etc, and calculate messages-received, messages-sent, bytes-received etc across all peers, not specific per peer. This way we get useful P95 metric with known error margin, without exploding the time series.

@Jorropo Jorropo changed the title Bitswap introspectability bitswap: peer prom tacker Jul 26, 2023
@Jorropo
Copy link
Contributor Author

Jorropo commented Jul 27, 2023

This was comunicated as not needed and we don't want to run it in production in Kubo due to the cardinality issue.
So I'll close this an leave the branch in case it comes back up later.

@Jorropo Jorropo closed this Jul 27, 2023
@BigLep BigLep mentioned this pull request Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Bitswap: better diagnostics and observability on client
2 participants