Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transport: Poll Transport directly, remove ListenersStream #2652

Merged
merged 56 commits into from
Jul 4, 2022

Conversation

elenaf9
Copy link
Contributor

@elenaf9 elenaf9 commented May 16, 2022

Description

Opened as a draft: I'd like to open the discussion on whether this direction is desired, and to discuss how this could be implement / how existing transport could be adapted to the change. Not ready for review yet.

Remove the concept of Listeners from the Transport trait. Instead require the Transport itself to implement a stream-like API with Transport::poll. Transport implementations can then internally implement the logic for managing one or multiple listeners.

The main motivation for this change is that the concept of multiple listeners is rather TCP specific and introduces unnecessary complexity for the QUIC transport (#2289 (comment)).
Apart from that, this generally follows the same direction as #2648 :

  1. Follow the style of "mutate state and return result via poll", same as we do for many other parts of rust-libp2p

The change includes:

  • Remove Transport::Listener and change Transport::listen_on to just return a Result<ListenerId, _>
  • Add Transport::poll, which returns TransportEvents (replacing the former `swarm/connection/ListenersEvent). Listeners should be driven here.
  • Remove swarm/connection/ListenersStream, instead poll the boxed transport directly in the swarm
  • Adapt transports/tcp: Wrap the GenTcpConfig into a new struct GenTcpTransport that implements Transport:
    • Manage multiple listener streams (this is essentially the logic that used to be in swarm/connection/ListenersStream
  • Adapt some of the existing Transports to this (not all yet, but that's only because I'd like to discuss the general direction first, not because I have concerns with specific transports)
  • Adapt tests of transports/tcp, core and swarm to this change -> tests pass

Links to any relevant issues

Relevant discussions:

Open Questions

Transport::dial

I did not change how Transport::dial works. I originally planned to also remove Transport::Dial and push that logic into Transport::poll as well. The problem with this would be that it does not work well with the concurrent-dial logic.
ConcurrentDial currently drives multiple dial futures (from potentially multiple transports) at the same time. Once the first succeeds, it drops the other ongoing dials. If the Transport would internally manage the Dial future, this logic would not work anymore. What we could do is something like the following:

  • Each dial is associated with a DialId, ConcurrentDial tracks the dial-ids of concurrent dials
  • We add Transport::abort_dial that can be called from the Pool once the first dial of a concurrent dialing batch succeeds

I am not sure if this is worth the extra complexity. Happy to hear alternative suggestions.

Open TODOs

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

elenaf9 added 4 commits May 16, 2022 02:45
Remove `Transport::Listener: Stream`. Instead require the Transport
itself to implement a stream-like API with `Transport::poll`.

In case of multiple listeners, transports are now required to handle
the multiple listener streams themselves internally.
Remove ListenersStream, instead poll the boxed transport directly
in the `Swarm` for `TransportEvent`s (which replace the former
`ListenersEvent`s).
Add new struct `GenTcpTransport` as wrapper for `GenTcpConfig` to manage
multiple listener streams.
This is essentially the old ListenerStream logic from swarm/connection.
Adapt majority of helper transports to the new Transport trait.
For most transports this just removes the extra *Listener type and
instead implements that logic in `Transport::poll`.

To adapt the `Boxed` transport the restriction had to be added that
transport is `Unpin`.
TODO: check if we can solve polling `Boxed` without the inner Transport
being unpin.
core/src/transport.rs Outdated Show resolved Hide resolved
core/src/transport.rs Outdated Show resolved Hide resolved
core/src/transport.rs Outdated Show resolved Hide resolved
swarm/src/lib.rs Outdated Show resolved Hide resolved
Comment on lines 129 to 130
match poll_fn(|cx| Pin::new(&mut listener_transport).poll(cx))
.await
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Polling transports this way is rather inconvenient. Should we auto-implement Stream for transports?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be in favor yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impl<T: Transport> Stream for T sounds good to me.

Copy link
Contributor Author

@elenaf9 elenaf9 May 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impl<T: Transport> Stream for T sounds good to me

Rust unfortunately does not allow this (I forget that every single time).
I considered adding a StreamTransport that just wraps a transport and implemented Stream, but since we already have the Boxed type I just implemented it for Boxed and adjusted the tests (bfd5fb0). Also implemented Stream for GenTcpTransport (90721b9) because you can't box a TCP transport directly. Edit: with d7f5019 we can now also box the tcp transport and don't need an extra Stream impl anymore.

transports/tcp/src/lib.rs Outdated Show resolved Hide resolved
@elenaf9 elenaf9 requested review from mxinden and thomaseizinger May 16, 2022 20:46
@thomaseizinger
Copy link
Contributor

Opened as a draft: I'd like to open the discussion on whether this direction is desired, and to discuss how this could be implement / how existing transport could be adapted to the change. Not ready for review yet.

Converted it to a draft for you :)

@thomaseizinger thomaseizinger marked this pull request as draft May 17, 2022 04:52
@thomaseizinger
Copy link
Contributor

Transport::dial

I did not change how Transport::dial works. I originally planned to also remove Transport::Dial and push that logic into Transport::poll as well. The problem with this would be that it does not work well with the concurrent-dial logic. ConcurrentDial currently drives multiple dial futures (from potentially multiple transports) at the same time. Once the first succeeds, it drops the other ongoing dials. If the Transport would internally manage the Dial future, this logic would not work anymore. What we could do is something like the following:

  • Each dial is associated with a DialId, ConcurrentDial tracks the dial-ids of concurrent dials
  • We add Transport::abort_dial that can be called from the Pool once the first dial of a concurrent dialing batch succeeds

I am not sure if this is worth the extra complexity. Happy to hear alternative suggestions.

I think from an API PoV, this would be better because it unifies handling of new connections to the caller of the poll function. At the moment, listen_on eventually triggers an Incoming event. In a similar manner, dial could trigger an Outgoing event that contains the actual DialFuture.

This way, concurrent dialling (and aborting the dial at any point) can still be implemented on top and is not a concern of the Transport implementation.

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Strong concept ACK for going in this direction. Left some comments :)

core/src/either.rs Outdated Show resolved Hide resolved
core/src/transport.rs Outdated Show resolved Hide resolved
core/src/transport.rs Outdated Show resolved Hide resolved
core/src/transport.rs Outdated Show resolved Hide resolved
core/src/transport.rs Outdated Show resolved Hide resolved
core/src/transport.rs Outdated Show resolved Hide resolved
core/src/transport.rs Outdated Show resolved Hide resolved
Comment on lines 129 to 130
match poll_fn(|cx| Pin::new(&mut listener_transport).poll(cx))
.await
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be in favor yes.

swarm/src/lib.rs Outdated Show resolved Hide resolved
@mxinden mxinden mentioned this pull request May 18, 2022
4 tasks
@elenaf9
Copy link
Contributor Author

elenaf9 commented May 22, 2022

While trying to adapt dns::GenDnsConfig and websockets::framed::WsConfig I noticed that they wrap the inner transport in a Mutex. @mxinden if I understand it correctly this was done as part of #2529 so that they can still implement Clone? I wonder if it is really necessary for them to be clone? Don't think it makes sense after this PR's change. When implementing poll on it it becomes a race condition when a waker is triggered which of the clones acquires the lock first and thus gets the TransportEvent.

@mxinden
Copy link
Member

mxinden commented May 23, 2022

While trying to adapt dns::GenDnsConfig and websockets::framed::WsConfig I noticed that they wrap the inner transport in a Mutex. @mxinden if I understand it correctly this was done as part of #2529 so that they can still implement Clone? I wonder if it is really necessary for them to be clone?

Expanding on this for the case of DNS. Websockets should be analogous.

On <GenDnsConfig as Transport>::dial the method returns a dialing Future. That Future, when polled, first resolves the DNS addresses in the provided Multiaddr, if any. It then calls Transport::dial on its inner Transport, e.g. libp2p-tcp. For this later step it needs a clone of the inner transport within the dialing Future.

Does the above make sense? Happy to expand further.

We could return a DialId in Transport::dial only, driving the DNS resolution in Transport::poll and only once resolved return the dialing Future which now only has to establish the TCP connection. That said, I don't think we should do this work in the main event loop (i.e. within Transport::poll), but instead in the per-connection-task.

I can not think of a way to get rid of the Arc<Mutex<_>> while still driving the DNS resolution on the per-connection-task instead of the main event loop.

@mxinden
Copy link
Member

mxinden commented May 23, 2022

If I am not mistaken the efforts of (1) refactoring Transport::listen and (2) refactoring Transport::dial can be done in two pull requests, i.e. in two in itself atomic changes, correct?

In addition, I am sensing that we have rough consensus on (1), namely of getting rid of listeners. Also, please correct me if I am wrong, we agree on (1) being a step forward even if we don't do (2).

With all of the above in mind, I suggest we do (1) only for now. Delaying (2) to another pull request.

What do you think?

@mxinden
Copy link
Member

mxinden commented May 23, 2022

Don't think it makes sense after this PR's change. When implementing poll on it it becomes a race condition when a waker is triggered which of the clones acquires the lock first and thus gets the TransportEvent.

Haven't thought of this. All the clones only ever call Transport::dial, never Transport::poll, thus in practice this is not an issue, correct? Ideally this would be enforced at the type level though :/

@elenaf9
Copy link
Contributor Author

elenaf9 commented May 23, 2022

On ::dial the method returns a dialing Future. That Future, when polled, first resolves the DNS addresses in the provided Multiaddr, if any. It then calls Transport::dial on its inner Transport, e.g. libp2p-tcp. For this later step it needs a clone of the inner transport within the dialing Future.

Does the above make sense? Happy to expand further.

Yes makes sense, thanks! I agree then that we need the Mutex.

Don't think it makes sense after this PR's change. When implementing poll on it it becomes a race condition when a waker is triggered which of the clones acquires the lock first and thus gets the TransportEvent.

Haven't thought of this. All the clones only ever call Transport::dial, never Transport::poll, thus in practice this is not an issue, correct? Ideally this would be enforced at the type level though :/

We could at least prevent it for users of GenDnsConfig (but not within its implementation). We can remove the Clone derivation for GenDnsConfig, but keep the Mutex around the inner transport. That way within one GenDnsConfig we can still clone the inner transport, but not GenDnsConfig itself, which prevents the described race condition.

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it to the end. Couple of more comments, though nothing major.

protocols/relay/src/v2/client/transport.rs Show resolved Hide resolved
transports/tcp/Cargo.toml Outdated Show resolved Hide resolved
transports/tcp/Cargo.toml Outdated Show resolved Hide resolved
transports/tcp/src/lib.rs Outdated Show resolved Hide resolved
transports/tcp/src/lib.rs Outdated Show resolved Hide resolved
transports/tcp/src/lib.rs Outdated Show resolved Hide resolved
@elenaf9
Copy link
Contributor Author

elenaf9 commented Jun 28, 2022

Thank you both for the reviews @mxinden @thomaseizinger! I know it is a large diff, so it's very much appreciated.
I think I addressed all comments, please point out if in case that I missed something.

@mxinden
Copy link
Member

mxinden commented Jun 28, 2022

Missing changelog entries. Otherwise this looks good to me 🎉

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. 🚀

@thomaseizinger any objections to merging this?

transports/tcp/CHANGELOG.md Outdated Show resolved Hide resolved
transports/tcp/CHANGELOG.md Outdated Show resolved Hide resolved
@thomaseizinger
Copy link
Contributor

Looks good to me. 🚀

@thomaseizinger any objections to merging this?

Nope, not at all!

@mxinden mxinden merged commit 62622a1 into libp2p:master Jul 4, 2022
@mxinden
Copy link
Member

mxinden commented Jul 4, 2022

Thanks @elenaf9 for taking this from start to finish. Very large undertaking!

elenaf9 added a commit to elenaf9/rust-libp2p that referenced this pull request Jul 10, 2022
Adapt to the transport changes of libp2p#2652.
Note: this is only a draft "to make it work", and not a proper
implementation. It does not support listening on multiple addresses.
The listening logic with multiple Endpoints will need to be supported for
the upstream implementation.
melekes added a commit to melekes/rust-libp2p that referenced this pull request Jul 12, 2022
umgefahren pushed a commit to umgefahren/rust-libp2p that referenced this pull request Mar 8, 2024
…er (libp2p#2652)

Remove the concept of individual `Transport::Listener` streams from `Transport`.
Instead the `Transport` is polled directly via `Transport::poll`. The
`Transport` is now responsible for driving its listeners.
binarybaron added a commit to UnstoppableSwap/core that referenced this pull request Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants