-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(quic): implement hole punching #3964
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, thank you @arpankapoor for your contribution! Also exciting that this working for you in the wild.
In case you haven't seen it yet, note that we will likely move to quinn
instead of quinn-proto
with our own wrapper "soon". See #3454 and #2883 (comment).
I will give this an in-depth review. No action required from your end yet. Posting some comments so that I don't forget.
Thanks for the initial reviews @mxinden, @thomaseizinger and @kpp! I would like to hear your comments/suggestions on the open questions I posted in the first comment, especially regarding addition of peerid to |
If we expect a particular |
@arpankapoor I implemented the logic for questions 1 and 2 from the description of the PR in e507e44 . Thank you! |
Co-authored-by: Roman <[email protected]>
Thanks a lot @kpp! I've copied over the relevant pieces here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done passing an potentially hole-punching inbound connection all the way back to the dialing end. While this adds complexity to libp2p-quic
, it nicely hides it such that upper layers don't need to worry about it.
Moving an out-of-band discussion here:
Note that (We can not release it before |
I've made the move to (2) and added a changelog entry. Please review the latest changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am liking where this is going! Some minor comments.
I wonder how we can best write a test for this. It would be really good to have some automated tests. But perhaps that is best done at an interop-test level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go from my end. I'd like @mxinden's approval too.
CI needs to be fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the comments @thomaseizinger raised in the last review and the failing CI, this looks good to me.
I will have low availability next week. @thomaseizinger please merge once you are happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Thank you so much for this work and your patience in iterating on it. I'll test this manually tomorrow and merge once successful.
From my understanding, sending random packets is required by the dcutr specification. Doing the simultaneous dial isn't enough ? Do we really need those random packets to hole punch ? Are they a plus ? |
The random packets establish a mapping in your NAT device which allow the packets of the incoming connection to be routed through. A simultaneous dial is different because the parties need to somehow work out, how the dialer and the listener is. It is my understanding that this works well enough for TCP because TCP doesn't really care about the "role" of a socket. For QUIC however, establishing the socket involves a cryptographic handshake which also assumes certain roles (who sends client-hello vs server-hello etc). If we were to just sim-open two connections with QUIC, we'd need support in the underlying QUIC stack to override the role. This is essentially the abstraction we have in There may be other reasons too but it seems like just sending random UDP packets to allow the other connection to be routed through is a much simpler solution and hence preferable. I wasn't involved in the dcutr spec, perhaps @mxinden can explain this further. |
Thanks for your response, this is clear now. So the incoming connection is authorized through the NAT thanks to the random packets sent by the puncher and eventually this incoming connection is used as the dialing one. However those random UDP packets may be dropped by a firewall which performs deep packet inspection, preventing the hole punch to succeed. |
Yes, but I wouldn't know how we could circumvent this. DPI may block all sorts of traffic. |
I just tested this successfully with my laptop sitting behind CG-NAT as the dialer (can't hole-punch through to CG-NAT unfortunately) and a separate machine in the same city but completely different network and it worked! Let's go! |
.with("0.0.0.0".parse::<Ipv4Addr>().unwrap().into()) | ||
.with(Protocol::Tcp(0)), | ||
) | ||
.listen_on("/ip4/0.0.0.0/udp/0/quic-v1".parse().unwrap()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which means that there is no way to perform hole punching in the example because eligible_listener
checks is_loopback
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the example for testing this in two real networks. If you run this example on localhost, you are not really testing hole punching but just that the protocol is executed. On localhost, the connection should always succeed because there is no need for the random UDP packets to punch a hole. Am I wrong?
Note @stormshield-pj50 that most of a QUIC UDP packet is encrypted, thus not much for deep-packet-inspection to analyze. Though in case DPI is ever a problem we could be smarter around faking the non-encrypted parts. |
Very happy to see this merged. Thank you @arpankapoor for the proactive work here. Thank you @thomaseizinger and @kpp for the continuous reviews and testing. |
Description
Implement
Transport::dial_as_listener
for QUIC as specified by the DCUtR spec.To facilitate hole punching in QUIC, one side needs to send random UDP packets to establish a mapping in the routing table of the NAT device. If successful, our listener will emit a new inbound connection. This connection needs to then be sent to the dialing task. We achieve this by storing a
HashMap
of hole punch attempts indexed by the remote'sSocketAddr
. A matching incoming connection is then sent via a oneshot channel to the dialing task which continues with upgrading the connection.Related #2883.
Notes & open questions
Please consider this a first draft. The design is largely similar to go-libp2p, except that I've only used the remote address as the hole punching attempt key. I have also added 2 examples analogous to the TCP hole punch tutorial and am able to establish a direct connection upgrade. Questions/issues to consider:
rust-libp2p
'sTransport::dial
/Transport::dial_as_listener
doesn't have peerid as an argument whichgo-libp2p
does. It could perhaps be extracted from the multiaddr or the trait could be changed to pass the peerid as well (not sure how involved this might be).Transport
trait doesn't receive the fully establishedConnection
for us to be able to extract the peerid from the incoming connection. Without the peerid, we are assuming that an incoming connection from the ip:port for which holepunching was previously attempted is from the same peer.Transport::dial
.Apart from this, I think I also found an issue with the DCUtR
NetworkBehavior
's implementation ofhandle_established_[in|out]bound_connection
. Sinceoutgoing_direct_connection_attempts
is indexed by the relayedConnectionId
, shouldn't this method first try to get the relayed connection id corresponding to the possibly upgraded direct connection fromdirect_to_relayed_connections
? If this is an issue, I could possibly create a separate PR for this.Change checklist