-
Notifications
You must be signed in to change notification settings - Fork 995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(swarm)!: refactor ListenUpgradeError and DialUpgradeError #3307
refactor(swarm)!: refactor ListenUpgradeError and DialUpgradeError #3307
Conversation
from ConnectionHandlerUpgrErr
c64f931
to
70ddded
Compare
instead of ConnectionHandlerUpgrErr, on case of timeout expiration we just log the error as we don't know which ConnectionHandler should handle the error.
instead of ConnectionHandlerUpgrErr, and create ConnectionEvent::DialTimeout for the situations where that upgrading an outbound substream to the given protocol has expired its timeout.
70ddded
to
c56bcc0
Compare
and DialUpgradeError updates
and DialUpgradeError updates
and DialUpgradeError updates
and DialUpgradeError updates
c56bcc0
to
fd29912
Compare
and DialUpgradeError updates
and DialUpgradeError updates
and DialUpgradeError updates
fd29912
to
d3291d7
Compare
and DialUpgradeError updates. gossipsub/handler: Move dial upgrade error logic from poll to on_dial_upgrade_error and rename pending_errors to pending_events, so that handler flow becomes more similar to other protocols.
and DialUpgradeError updates.
ListenUpgradeError and DialUpgradeError updates.
d3291d7
to
021502c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the simplifications coming from this!
I don't quite agree with reporting a kind of error (timeout) next to other errors. I think that should be unified or not reported at all. I am tending towards the latter for a lot of protocols where we can make a decision on what that timeout means.
continue; | ||
} | ||
ConnectionHandlerUpgrErr::Timeout => { | ||
log::debug!("Timeout expired during an inbound substream negotiation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log::debug!("Timeout expired during an inbound substream negotiation") | |
log::debug!("Failed to negotiate protocol on inbound stream within {seconds}s") |
That would be more useful IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seconds
doesn't seem to exist in this scope, we probably can make ConnectionHandlerUpgrErr::Timeout
have it . I will implement after clarifying #3307 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seconds
doesn't seem to exist in this scope, we probably can makeConnectionHandlerUpgrErr::Timeout
have it . I will implement after clarifying #3307 (comment)
Yes, it was only meant to be illustrative on what I'd like to have printed!
/// Informs the handler that upgrading an outbound substream to the given protocol has failed. | ||
DialUpgradeError(DialUpgradeError<OOI, OP>), | ||
/// Informs the handler that upgrading an outbound substream to the given protocol has expired its timeout. | ||
DialTimeout(DialTimeout<OOI>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense for DialUpgradeError
to be an enum with a Timeout
and Upgrade
variant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#3268 is related to this. If we agree to remove the OpenInfo
from ConnectionHandler
a lot of this code would be simpler because we wouldn't need to track info
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first sight, making DialUpgradeError
an enum with a Timeout
and Upgrade
variant sounded similar to what we currently have, where ConnectionHandlerUpgrErr
then has the variants :
rust-libp2p/swarm/src/handler.rs
Lines 340 to 343 in 5b8c3d2
pub struct DialUpgradeError<OOI, OP: OutboundUpgradeSend> { | |
pub info: OOI, | |
pub error: ConnectionHandlerUpgrErr<OP::Error>, | |
} |
But what you are suggesting seems to be in fact renaming ConnectionHandlerUpgradeError
to DialUpgradeError
and using it in ConnectionEvent::DialUpgradeError
right?
rust-libp2p/swarm/src/handler.rs
Lines 442 to 448 in 021502c
#[derive(Debug)] | |
pub enum ConnectionHandlerUpgrErr<TUpgrErr> { | |
/// The opening attempt timed out before the negotiation was fully completed. | |
Timeout, | |
/// Error while upgrading the substream to the protocol we want. | |
Upgrade(UpgradeError<TUpgrErr>), | |
} |
In fact it probably makes sense (we are in fact reducing one indirection layer) even more sense than having Timeout
as root variants where I agree with you that they are errors by themselves. Do you also agree @mxinden?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off the top of my head this sounds good to me. I will have to give this more thought with a clear mind to give a good answer.
} | ||
|
||
#[derive(Debug, Error)] | ||
pub enum Error { | ||
#[error("Failed to dial peer.")] | ||
Dial, | ||
#[error("Failed to establish substream: {0}.")] | ||
Handler(ConnectionHandlerUpgrErr<void::Void>), | ||
Handler(UpgradeError<void::Void>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, we emitted an error (and thus closed the connection) when we ran into a timeout. Why are we changing this behaviour now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the behaviour the same? In the sense that the connection was closed because of a timeout and we are bubbling it up? But after reading and replying on #3307 (comment) I now see that Timeout makes sense being a variant on the Error
itself of DirectConnectionUpgradeFailed
rather than a variant of Event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the behaviour the same? In the sense that the connection was closed because of a timeout and we are bubbling it up?
Unless I am mistaken, the timeout refers to a stream upgrade. The connection can/should still be intact at this stage. Something like PendingUpgrade
should trigger this timeout on the remote but the connection underneath works perfectly fine.
It does raise the question whether we should close an entire connection at all just because one protocol failed to upgrade a stream. Just disabling that protocol (and handler) and letting the collaborative keep-alive eventually close the connection seems like a better idea at first sight.
We may want to do this in a separate PR though, either before or after we merge this.
protocols/identify/src/behaviour.rs
Outdated
/// Timeout expired while attempting to identify the remote. | ||
Timeout, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A timeout is an error so it seems odd that we have an Error
variant and then this. I'd suggest we either log this internally and don't report it or create a dedicated error enum ourselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are already reporting it up I'd go as we do in dcutr
and expose it as an Error
, updated to do that! Does it make sense to you to also do the same for relay
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to you to also do the same for
relay
?
Yes I think we should apply that thinking to all protocols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks done, ptal! Also removed what seems to be unrequired inbound_hop::UpgradeError
and shortcircuit to inbound_hop::FatalUpgradeError
see c2eda37 and see if you agree.
instead of as a variant of Event.
and Timeout as part of Error instead of as a variant of Event.
and Timeout as part of Error instead of as a variant of behaviour::Event. Introduce `client::InboundError` and Timeout as part of Error instead of as a variant of client::Event.
d4672d3
to
c2eda37
Compare
Sorry for only mentioning this only now but it just came to me that once we have #2863, all of these errors will go away because there won't be any "upgrade" mechanism anymore for protocols defined through Is it worth doing this refactoring here at all or should we just work towards #2863 instead? Personally, that would be my preference but it is also more work. Thoughts? |
No worries Thomas. Good question, no strong opinion though I think it probably doesn't makes sense to introduce this and release it, make libp2p users upgrade to this to then upgrade again with #2863 right? So I'd rather just jump to #2863 CC @mxinden wdyt? |
Yeah I thought so too which is why I brought it up. I updated the issue with a few more thoughts on how we could move forward there. |
Whilst hacking on #3201, I got very confused in how the error handling for connection and upgrade errors is implemented across our protocols. At the highest level, we have
Confusingly enough, we call those
rust-libp2p/swarm/src/handler.rs Lines 433 to 442 in e3c7023
As you @jxs discovered in here, the The What is left here is rust-libp2p/core/src/upgrade/error.rs Lines 24 to 31 in e3c7023
Once we ship #2863, the Meaning the only error that can actually happen after #2863 is
I am drawing three conclusions from this:
|
Agreed with everything in the latest comment from @thomaseizinger. Thanks for looking so deeply into this.
I don't have an opinion on how to proceed, e.g. whether to first finish here, do #2863, or clean-up fatal and non-fatal errors. Up to you @jxs and @thomaseizinger. |
Any work that pushes #2863 forward would be much appreciated from my end! |
Description
Addresses #2894
Notes
In the protocols, moved the
TimedOut
variants from nestedUpgradeError
's toEvent
variants themselves. Likeidentify
dcutr
andrelay
.Per commit review is suggested.
Change checklist