Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add nim hole punching tests #322

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

diegomrsantos
Copy link
Contributor

@diegomrsantos diegomrsantos commented Nov 6, 2023

Adds TCP Hole Punching tests for nim.

closes vacp2p/nim-libp2p#966

@diegomrsantos
Copy link
Contributor Author

@thomaseizinger, I'm getting the following error when running the tests:

nim-v1_1_x_nim-v1_1__tcp_-dialer-1           | TRC 2023-11-06 16:37:00.372+00:00 got message                                topics="libp2p yamux" tid=30 h="{Data, {Syn}, streamId: 4, length: 0}"
nim-v1_1_x_nim-v1_1__tcp_-dialer-1           | TRC 2023-11-06 16:37:00.373+00:00 Closing yamux connection                   topics="libp2p yamux" tid=30 error="Peer used our reserved stream id"

After investigating, I found the following spec about stream id: https://github.com/hashicorp/yamux/blob/master/spec.md#streamid-field

But the Rust implementation seems to say something different: https://github.com/libp2p/rust-libp2p/blob/135942d3190634beebdc614c70117ced61c9cd99/muxers/mplex/src/codec.rs#L41

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Nov 6, 2023

@thomaseizinger, I'm getting the following error when running the tests:

nim-v1_1_x_nim-v1_1__tcp_-dialer-1           | TRC 2023-11-06 16:37:00.372+00:00 got message                                topics="libp2p yamux" tid=30 h="{Data, {Syn}, streamId: 4, length: 0}"
nim-v1_1_x_nim-v1_1__tcp_-dialer-1           | TRC 2023-11-06 16:37:00.373+00:00 Closing yamux connection                   topics="libp2p yamux" tid=30 error="Peer used our reserved stream id"

After investigating, I found the following spec about stream id: https://github.com/hashicorp/yamux/blob/master/spec.md#streamid-field

FYI: We vendored the yamux spec into the specs repo: https://github.com/libp2p/specs/tree/master/yamux

That test is running nim against nim. The only Rust component in there involved is the relay. Both nodes are clients to the relay, so should use odd IDs. The relay will thus use even IDs. Does that make sense?

But the Rust implementation seems to say something different: https://github.com/libp2p/rust-libp2p/blob/135942d3190634beebdc614c70117ced61c9cd99/muxers/mplex/src/codec.rs#L41

You are referencing the mplex implementation here. The holepunch test suite will always use noise+yamux to secure the relayed connection.

@diegomrsantos
Copy link
Contributor Author

Thanks for the answer. I thought it could be a problem with the Rust relay. I couldn't find the Rust code implementing the creation of stream ids in Rust. I'll try to reproduce only using a Nim relay.

@thomaseizinger
Copy link
Contributor

It is here: https://github.com/libp2p/rust-yamux

We run this in the interop tests so I am quite surprised there would be an issue here.

@diegomrsantos diegomrsantos changed the title Feat/add nim hole punch tests feat: add nim hole punch tests Nov 9, 2023
@diegomrsantos diegomrsantos force-pushed the feat/add-nim-hole-punch-tests branch from 88d5f21 to 6a4d45e Compare November 19, 2023 01:28
@diegomrsantos diegomrsantos force-pushed the feat/add-nim-hole-punch-tests branch from 66f208b to fef9abc Compare November 19, 2023 01:41
@diegomrsantos diegomrsantos changed the title feat: add nim hole punch tests feat: add nim hole punching tests Nov 21, 2023
@diegomrsantos
Copy link
Contributor Author

Sorry, it was an issue with our relayv2 and yamux interaction, it has been fixed now.

@thomaseizinger
Copy link
Contributor

Sorry, it was an issue with our relayv2 and yamux interaction, it has been fixed now.

Cool! So the tests are already pulling its weight by finding bugs, nice 😁

There seems to be another issue when rust-libp2p is the dialer. Did you experience that locally too? I haven't looked at it closely yet :)

@diegomrsantos
Copy link
Contributor Author

diegomrsantos commented Nov 21, 2023

Those tests are great, thanks a lot for working on it.

I have a workaround locally that makes all the tests pass, but it didn't seem to be the right fix, maybe it is. I believe the current issue is somewhat related to the previous one. If I understand correctly, the listener (dcutr client) creates an inbound TCP connection, and the dialer (dcutr server) an outbound one. I believe the listener creates an outbound Yamux stream and the dialer is also considering it as outbound. This makes both of them use odd stream IDs, causing the error. This is my current hypothesis that needs to be validated. Would you happen to know if the Rust dcutr server does that? If that's the case, should the stream created by the listener (dcutr client) be outbound or the same as the TCP connection direction?

@thomaseizinger
Copy link
Contributor

The underlying TCP connections to the relay are both outbound from the client's perspective. Both of them dial the relay, right?

For the relayed connections between the two clients, the client with the reservation acts as a listener (i.e. yamux server) and other side as the dialer (i.e. yamux client).

So there are two yamux connections involved on either end (layered on top of each other). The first one (together with noise) secures and multiplexes the connection with the relay, the second one does so for the connection between the two clients.

Note that in theory, the relayed connection on top could also use mplex or tls. I just chose to use yamux and noise again.

@diegomrsantos
Copy link
Contributor Author

diegomrsantos commented Nov 21, 2023

I was talking about the new TCP connection created during dcutr. It doesn't use the relay connection, does it? Sorry for not stating it clearly. I believe the current issue happens during the connection upgrade right after both peers tried a new direct connection, outside the relay one.

cc @lchenut

@thomaseizinger
Copy link
Contributor

I was talking about the new TCP connection created during dcutr. It doesn't use the relay connection, does it?

Ah yes, correct!

The spec says that A acts as the dialer on the new connection whereas B opened the DCUtR stream to initiate the hole-punch. A is also the node that initiated the relayed connection.

@diegomrsantos
Copy link
Contributor Author

diegomrsantos commented Nov 21, 2023

In these hole-punching tests, A is the dialer, and B is the listener. The spec says "For the purpose of all protocols run on top of this TCP connection, A is assumed to be the client and B the server.". It seems to me it doesn't explicitly say who should the TCP client and server, but I believe it is reasonable to assume A is the TCP client (outbound) and B is the TCP server (inboud). It does explicitly say that from the Yamux perspective (a protocol running on top of the TCP connection), A is the Yamux client (outbound) and B is the Yamux server (inbound).

I believe the Rust dialer (A) implemented it correctly, but the Nim listener (B) is incorrectly thinking it is outbound as well.

@thomaseizinger
Copy link
Contributor

I believe the Rust dialer (A) implemented it correctly, but the Nim listener (B) is incorrectly thinking it is outbound as well.

That could be it! In Rust we solve this with a "dial as listener" instruction that overrides, what role we assume for ourselves on the resulting connection. This is essential for hole-punching to work. I am surprised the nim <> nim test works. Why is that one not failing too?

@diegomrsantos diegomrsantos force-pushed the feat/add-nim-hole-punch-tests branch 2 times, most recently from a05973f to b4dc7db Compare November 22, 2023 15:45
@diegomrsantos diegomrsantos force-pushed the feat/add-nim-hole-punch-tests branch from b4dc7db to aa8b4aa Compare November 22, 2023 18:24
@thomaseizinger
Copy link
Contributor

Green tests 🎉

@diegomrsantos
Copy link
Contributor Author

I'm still trying to understand what's going on. I'll add it here later. Thanks for your help.

@diegomrsantos
Copy link
Contributor Author

Hopefully this explains the issue and the fix vacp2p/nim-libp2p#994

@diegomrsantos diegomrsantos marked this pull request as ready for review November 30, 2023 21:25
@@ -0,0 +1,21 @@
image_name := nim-v1.1
commitSha := 93925ac28b04cd94d06a332412973e90f80f35df
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, this is a commit that is not yet merged into master, correct? I am okay with that, assuming it will eventually be updated to one that sits on master, ideally even one that points at a particular release.

Given that you are adding the workflow in vacp2p/nim-libp2p#998, I'd suggest it is save to merge that PR first, given that it essentially tests the exact same version as what is referenced here.

Once that is merged into your master, we can update this commit hash here and merge this PR.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks.

hole-punch-interop/impl/rust/v0.53/Makefile Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Hole Punching to libp2p test-plans
2 participants