-
Notifications
You must be signed in to change notification settings - Fork 995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core/muxing: Use total number of alive inbound streams for backpressure #2878
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this POC!
From what I can see here, I am in favor of this proposal.
I would like to only merge here once we have at least one user of this API. I would guess that the best user for this would be libp2p-metrics
.
While we could emit a new SwarmEvent
on each new stream, I wonder how we could inform libp2p-metrics
once a stream closed.
Next to the number of substreams, it would be wonderful to expose the protocol that is being negotiated on the substream as well. That would enable us to expose the following metric:
libp2p_swarm_stream { direction = "inbound", protocol = "/libp2p/kad/1.0.0" } 10
Yes, makes sense. I see the first user in
I did something like this in the past. If we make |
swarm/CHANGELOG.md
Outdated
- Use the total number of alive inbound streams for back-pressure. This can have a BIG impact on your application | ||
depending on how it uses `libp2p`. Previously, the limit for inbound streams per connection only applied to the | ||
_upgrade_ phase, i.e. for the time `InboundUpgrade` was running. Any stream being returned from `InboundUpgrade` and | ||
given to the `ConnectionHandler` did not count towards that limit, essentially mitigating the back-pressure mechanism. | ||
With this release, substreams count towards that limit until they are dropped and thus we actually enforce, how many | ||
inbound streams can be active at one time _per connection_. `libp2p` will not accept any more incoming streams once | ||
that limit is hit. If you experience stalls or unaccepted streams in your application, consider upping the limit via | ||
`SwarmBuilder::max_negotiating_inbound_streams`. See [PR 2878]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I am giving this too much attention but it feels like a really important change to me.
swarm/CHANGELOG.md
Outdated
- Use the total number of alive inbound streams for back-pressure. This can have a BIG impact on your application | ||
depending on how it uses `libp2p`. Previously, the limit for inbound streams per connection only applied to the | ||
_upgrade_ phase, i.e. for the time `InboundUpgrade` was running. Any stream being returned from `InboundUpgrade` and | ||
given to the `ConnectionHandler` did not count towards that limit, essentially mitigating the back-pressure mechanism. | ||
With this release, substreams count towards that limit until they are dropped and thus we actually enforce, how many | ||
inbound streams can be active at one time _per connection_. `libp2p` will not accept any more incoming streams once | ||
that limit is hit. If you experience stalls or unaccepted streams in your application, consider upping the limit via | ||
`SwarmBuilder::max_negotiating_inbound_streams`. See [PR 2878]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to give this more thought. Previously I was under the impression that this will be feature for reporting only, but not actually enforcing a limit.
As of today I am undecided whether the number of established inbound streams should have a global connection maximum, or whether it should only have a maximum per ConnectionHandler
implementation, potentially coordinated with the NetworkBehaviour
and thus a maximum per NetworkBehaviour
implementation. I am not sure the global application has enough knowledge to set a suitable limit for all protocols on a single connection, nor do I think a static limit per connection is a good idea in the first place.
Ideally I would like to enforce a limit for connections only. With #2828 a connection limit could e.g. be dynamic based on the current memory utilization. Protocols, e.g. like Kademlia, would enforce a streams-per-connection limit in their ConnectionHandler
implementation. That limit would be a value of maximum expected parallelization, e.g. 16 (as in "we don't expect an implementation to be handle more than 16 requests in parallel").
What do you think @thomaseizinger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want the limit to be dynamic, all we need to do is add a function to ConnectionHandler
that allows us to query the max number of allowed streams:
trait ConnectionHandler {
fn max_inbound_streams(&self) -> usize {
128 // This is today's default.
}
}
We can then use this limit on every iteration of Connection::poll
to check if we should poll for more inbound streams.
If we move forward with #2863 then the notion of upgrades will be gone and this limit does not mean anything anymore. Thus, I am inclined to say we should remove this from the Swarm
and Pool
altogether and only have the ConnectionHandler
decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we move forward with #2863 then the notion of upgrades will be gone and this limit does not mean anything anymore. Thus, I am inclined to say we should remove this from the
Swarm
andPool
altogether and only have theConnectionHandler
decide.
That would be my preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LMK what you think of 40ab076
(#2878).
This effectively deprecates `SwarmBuilder::max_negotiating_inbound_streams`.
This will result in peers never being able to open a substream and we currently depend on this behaviour, despite the upgrade never being successful.
These will eventually go away so we don't bother replacing the links.
@mxinden Updated the PR description with an open question. |
@mxinden In #2957, a usecase for tracking the number of substreams in metrics came up. I can't think of a way of recording this though if we don't want to take in a dependency on |
Co-authored-by: Ryan Plauche <[email protected]>
f7b2221
to
a6bd2e4
Compare
@AgeManning I'd be curious what you think of this PR in regards to what it changes in gossipsub ( The gist is that we can now enforce, how many substreams to accept from the remote. Given that GossipSub should only ever have one inbound stream, I set this to one. |
Yeah the changes look fine to me. We only have one inbound substream as you've pointed out, so I dont see any issue here |
I can't reproduce this error on my machine. Anyone else got any luck? Also, can you have another look at this @mxinden? I implemented a per connection-configurable limit now as requested in #2878 (comment) :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think long term this is something we should be doing. I am not yet sure on whether the current implementation allows collaborative usage of these limits across ConnectionHandler
implementations.
In my eyes, before we do this, we should tackle #2863 first, thus simplifying the sets (pending-multistream, pending-upgrade, negotiated) of inbound streams.
Another thought would be to redesign max_inbound_streams
to be closer to Sink::poll_ready
:
fn poll_new_inbound_stream_ready(&self, cx) -> Poll<()> {
A ConnectionHandler
could thus signal in a async way whether it is willing to accept new streams.
Also adding this to the agenda for the next rust-libp2p community call.
/// The maximum number of inbound substreams allowed on the underlying connection. | ||
/// | ||
/// Once this limit is hit, we will stop accepting new inbound streams from the remote. | ||
fn max_inbound_streams(&self) -> usize { | ||
DEFAULT_MAX_INBOUND_STREAMS | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case an implementation of ConnectionHandler
does not implement this method, but accepts DEFAULT_MAX_INBOUND_STREAMS
streams, it would starve all other implementations of ConnectionHandler
from receiving more inbound streams, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the connection task PoV, there is only one ConnectionHandler
. I think if we get the combinators right, this shouldn't be an issue?
self.handlers | ||
.values() | ||
.map(|h| h.max_inbound_streams()) | ||
.max() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be the sum? I.e. the sum of all limits equals the total limit.
.values() | ||
.map(|h| h.max_inbound_streams()) | ||
.max() | ||
.unwrap_or(0) // No handlers? No substreams. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:D
Do you think the proposed implementation is an improvement? I think it is but I agree with you that it can be even better.
Funny, I see this a step towards that solution. With this PR, a substream counts towards a limit as soon as it exists and not just during the upgrade phase. The next idea (but could be done in parallel) for #2863 is to make less use of upgrades in other protocols but build a
This is an interesting idea! Should a
👍 |
Setting to draft until we reach a decision. |
Closing because stale. |
Description
A PoC implementation for what is described in #2865.
Links to any relevant issues
ConnectionHandler
trait by removing as many associated types as possible #2863Open tasks
StreamMuxer
#2861SubstreamBox::new
from public API oflibp2p-core
libp2p-core
to 0.37.0libp2p-swarm
to 0.40.0Open Questions
Should
const MAX_NUM_INBOUND_SUBSTREAMS: usize = 32;
be refactored?libp2p-kad
make use of the new limit andrust-libp2p/protocols/kad/src/handler.rs
Line 42 in cce296e
Change checklist